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Abstract 

A neuron transforms its input into output spikes, and this transformation is the basic unit of computation 
in the nervous system. The spiking response of the neuron to a complex, time-varying input can be 
predicted from the detailed biophysical properties of the neuron, modeled as a deterministic nonlinear 
dynamical system. In the tradition of neural coding, however, a neuron or neural system is treated as a 
black box and statistical techniques are used to identify functional models of its encoding properties. The 
goal of this work is to connect the mechanistic, biophysical approach to neuronal function to a description 
in terms of a coding model. Building from preceding work at the single neuron level, we develop from 
first principles a mathematical theory mapping the relationships between two simple but powerful classes 
of models: deterministic intcgratc-and-firc dynamical models and linear-nonlinear coding models. To do 
so, we develop an approach for studying a nonlinear dynamical system by conditioning on an observed 
linear estimator. We derive asymptotic closed-form expressions for the linear filter and estimates for 
the nonlinear decision function of the linear/nonlinear model. We analytically derive the dependence of 
the linear filter on the input statistics and we show how deterministic nonlinear dynamics modulate the 
properties of a probabilistic code. We demonstrate that integrate-and-fire models without any additional 
currents can perform perfect contrast gain control, a sophisticated adaptive computation, and we identify 
the general dynamical principles responsible. We then design from first principles a nonlinear dynamical 
model that implements gain control. While we focus on the intcgratc-and-firc models for tractability, 
the framework we propose to relate LN and dynamical models generalizes naturally to more complex 
biophysical models. 

Author Summary 

One of the primary goals of sensory neurophysiology is to characterize the stimulus information that is 
extracted and transmitted by a neural system. This is typically done by identifying stimulus features to 
which the system is sensitive, and then finding the probability of response as a function of those stimulus 
features. Together, the relevant features and response define a coding model. Finding a correspondence 
between coding models and underlying neuronal circuitry and biophysics has been elusive. Here, we derive 
the relationship between such a coding model and biophysical parameters in the case of an important 
class of single neuron models. This derivation allows us explore a remarkable property of neural coding: 
the ability of neural systems to represent information in relative units. We find parameters for which this 
property may be perfectly realized by an individual neuron. 

Introduction 

The neuron is arguably the fundamental information processing element of the nervous system, encoding 
a pattern of inputs into a sequence of output spikes. To what properties of the input is the neuron 
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sensitive? What mathematical form does the transformation from input to spiking output take? What are 
the general principles governing that transformation? These coding questions are paired with questions 
of implementation: what biophysical mechanisms support the observed computational properties? Of the 
hundreds of known ion channels expressed throughout nervous systems, why do channels with specific 
kinetics appear with particular densities in particular locations? How do the biophysical parameters of 
neurons map onto their computational function? To start to address these questions, our goal in this work 
is to develop an explicit analytic bridge between a dynamical systems description of neuronal behavior 
and a coding model. 

Integrating a biophysical and computational perspective on single neuron function requires an integration 
of the distinct mathematics of dynamics and statistics. On the coding side, the computational framework 
that we study in this paper is that of the linear-nonlinear (LN) model. The LN model is defined by a 
linear filter representing the feature that is relevant to firing a spike, and a decision function that acts on 
the filtered stimulus to determine the instantaneous firing rate. For a white noise input with fixed mean 
and variance, the unbiased and consistent LN model given the input statistics can be found with reverse 
correlation techniques. When the relevant feature space is low-dimensional, the LN model constitutes a 
complete description of the neuron's computation when the instantaneous firing rate is the only relevant 
output statistic pQ. 

In contrast, the natural language of neuronal biophysics is nonlinear dynamics. In the limit of large 
numbers of channels, the neuron's behavior is described by a potentially high-dimensional set of nonlinear 
differential equations for the evolution of the membrane potential and the state of the ion channels as a 
function of current input [2]- Spikes are stereotyped events in the time series of the membrane voltage. 
We will limit ourselves here to single-compartment models, and to the case in which internal noise is 
negligible and so spikes are generated deterministically in response to current input. 

Although one would like to understand the general relationship between a multivariate dynamical model 
and an LN coding model, for tractability we will start with models with a single dimension that nonethe- 
less incorporate some of the properties of conductance-based models, the leaky integrate-and-fire (LIF) 
and the exponential integrate-and-fire (EIF) models. The LIF model describes the dynamics of the volt- 
age as linear below a voltage threshold, at which spikes are generated instantaneously. This model is 
effective when there is a large separation of timescales between those of the sodium and potassium chan- 
nel dynamics responsible for spike generation and the passive "leak" voltage dynamics of the membrane 
(which includes the membrane properties and any linear contributions to the subthreshold dynamics). 
The EIF model adds to this an exponential subthreshold nonlincarity, providing an intrinsic instability 
that leads to spiking. This model has been shown to fit well to data from cortical neurons OH] and the 
Wang-Buzaki model of hippocampal fast spiking intcrneurons [1HS] ■ 

A key property of neural systems is that they adapt to the statistics of the input [7HT0] . Single neuron 
computation as characterized by LN models also generally appears adaptive in the sense that the optimal 
LN model for predicting neuronal response changes as the statistics of the input change [TTUT5] . We 
will address this property explicitly, focusing on changes in stimulus variance. As the variance of the 
white noise input increases, the optimal filters tend to have shorter timescales and the decision function 
changes slope, or gain |16H19| . We consider in this paper the phenomenon of perfect contrast gain control, 
whereby the decision function adapts to changes in the input standard deviation (contrast) such that the 
fine temporal structure of the firing rate is controlled by the size of the input relative to the input standard 
deviation. Biophysical models, on the other hand, do not adapt in the sense given above: the model does 
not change form when the input statistics are changed. While slow dynamics may cause spike frequency 
changes, these will be included in a complete biophysical model of a neuron via differential equations 
with fixed parameters. We aim to determine under what conditions a fixed neuronal dynamical model 
leads to perfect contrast gain control. 
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In this paper, we accomplish two primary goals. First, we demonstrate how to derive LN models from 
deterministic dynamical models from first principles. The two models are related by a voltage estimation 
problem: the linear filter estimates the state of the membrane voltage, and the nonlinear decision function 
is determined by the interaction of the intrinsic spike-generating currents and the precision of the linear 
voltage estimate. We propose that the proper choice of filter maximizes the precision of the estimate 
of the voltage in the spike-generating regime. We identify the principles behind why the optimal filter 
adapts to changes in input standard deviation, and we derive approximate expressions for the form of the 
optimal filter in various limits. From the perspective of this first goal, the deterministic spiking model 
is taken to be fundamental. The optimal LN model provides a representation of the encoding performed 
by the spiking model but, as an approximation, is necessarily stochastic. 

Second, we change perspectives and take the family of LN models to be fundamental and assume that they 
completely describe the relevant computation of the neuron. From this perspective, the dynamical details 
that the LN model does not predict can provide a mechanism to modulate the optimal decoding of a single 
spike in response to changes in contextual properties of the stimulus. While for small input standard 
deviations, the LIF model is a detector of threshold crossings, we show that for large standard deviations, 
the LIF model generically shows perfect contrast gain control. We identify the general principles behind 
how simple dynamical models can perform this "intelligent" adaptive computation. We then show that 
this general understanding of contrast gain control can be used to constrain kinetic parameters in the EIF 
model to allow it to show contrast gain control without requiring asymptotically large input standard 
deviation. 

The framework we propose to relate LN and dynamical models is quite general. We focus on the integrate- 
and-fire models because they are simple, surprisingly rich, and allow us to obtain analytic results that 
make more general points. However, the fact that more complex neurons can often be reduced to an EIF 
model means that our results have power beyond this simple case. We will discuss how this framework 
extends to higher-dimensional neuronal models. 

Results 

We drove integrate-and-firc models with white noise current with zero mean and standard deviation 
proportional to a, where a is reported in units of the difference between the resting and threshold 
voltages, vth ~ v (see Section: Integrate- and- fire models). Since the mean current can be absorbed into 
the resting and threshold potentials, we fix it to zero and focus only on changes in the input standard 
deviation ("contrast"). Note that while the inputs are drawn randomly, they are completely specified — we 
do not consider any unknown background noise. Thus the response of the dynamical model to the input 
is deterministic and the instantaneous firing rate, R(t), is either zero or dt^ 1 , where dt is the sampling 
time step. 

For each input standard deviation, we characterized the computation performed by the deterministic 
dynamical model with an LN model. As explained in Section: Identifying LN models, an LN model for 
a single neuron produces an estimate of the instantaneous firing rate in response to the input current. 
The LN model consists of two parts: a feature, h(i), that acts on the input to produce a linearly filtered 
stimulus, s(t), which is the amplitude of the feature present in the input, and a nonlinear threshold 
function that acts on the filtered stimulus to estimate the instantaneous firing rate. 

For Gaussian white noise inputs, the spike-triggered average current (STA), defined in Eq. ([SB")) , is the 
consistent and unbiased choice for the filter; the STA of the LN model is the STA of the dynamical model. 
However, the STA is not necessarily the optimally predictive filter. We examined three choices of the 
filter: the normalized STA, which we denote h s ; the membrane filter, h m , introduced in Eq. (|45|) : and 
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the stochastic linearization filter, hi, introduced in Eq. (pH]) . The filtered stimulus s x (t) is defined as the 
convolution of the input current with the filter with corresponding subscript, h x . The threshold function 
that predicts the firing rate — the rate estimation function — is defined relative to a specific choice of filter. 
The filter identification is denoted simply through its dependence on s x : R a [s x ]. 

LN models for the LIF model based on the spike-triggered average filter 

Spike times in the LIF model correspond to the instants when the membrane voltage is above threshold: 
v{t) > vth- We used reverse correlation techniques (see Section: Identifying LN models with reverse 
correlation) to build LN models in which the filter is the spike-triggered average current (STA) defined 
in Eq. (|55|) . Simulations were run with the resting and reset voltages equal: v r — v Q . The results for the 
LIF model are qualitatively insensitive to the values of the parameters used. Results are shown in Fig. 
[TJ^~C for 0.45 < a < 10. 
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Figure 1. LN models for the LIF model for 0.45 < a < 10. A . Spike-triggered average in units of 
('Wisp) _ fpjjg co i or C ode represents input standard deviation and is preserved across all figures unless 

fT-v/ T j dt 

otherwise indicated. C. STA on log scale. The solid black line is the passive membrane filter in Eq. 
(|45|l . Dotted is the stochastic linearization filter in the large-c limit (see Eq. (|22j0 . The STA adapts to 
changes in the input statistics and approaches a fixed form for large a. B. Simulation results for the 
normalized rate estimation function based on the STA filter, R a [s s ] / Ra corresponding to Eq. ([55]) : 
data is scaled by the mean firing rate and plotted with respect to relative stimulus strength, s s /cr. Rate 
estimation functions are graded thresholds. For large a, the LN models show perfect contrast gain 
control as defined in Eq. (f27|) . D. As in B, but for the LN models based on the passive membrane filter. 
Comparison of C and D supports the results about the predictive power of different filters at different 
limits in Fig. 21 as discussed in the text. At low c, the rate estimation functions for the membrane filter 
are more selective (higher stimulus threshold) and are more precise (sharply-peaked) than the 
STA-based models, and are thus more informative, as argued preceding Eq. (jTTJ). For large a, the 
STA-based model is more selective and has a larger dynamic range, giving it greater predictive power in 
comparison to the membrane filter models, as anticipated by the argument preceding Eq. (|19|) . 
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The STA of an LIF neuron when driven by stimuli with different variances has been fairly well-studied, 
numerically and analytically [20J|2I]. The STA is an integrating filter for all input standard deviations. 
The detailed shape of the STA can be decomposed into an exponential tail at longer times and a rapid 
rise at short times, as shown in Eq. (|81|) and Fig. 1101 Both components depend on the input standard 
deviation. In all cases, at times very near the spike time and short compared to the membrane time 
constant, r, the STA shows a rapid upward fluctuation whose peak value is proportional to a and depends 
on the correlation time of the input (see Section: Discretization and regularization for discussion). This 
singular component of the STA is associated with crossing threshold from below 20 ,22 23 and its detailed 
shape is due to the discontinuous dynamics of the spike [2"Tll2~ill2"5] . The longer-time behavior of the STA 
is exponential. For small a only, the timescale of the exponential is given by the membrane time constant, 
t. For finite a, the timescale of the STA is always less than the membrane time constant. At large a, 
the STA goes to a fixed form — the normalized STA becomes invariant to changes in the input standard 
deviation. 

Rate estimation functions were built using Eq. ([55)1 based on the STA-filtered stimulus, s s (t). Quali- 
tatively, the rate estimation function based on the STA-filtered stimulus is a graded threshold, with a 
form that depends on a. As a is increased, there is a shift in the minimum value of the filtered stimulus 
that permits spiking (a so-called "sub-tractive" gain change). For small a, the rate estimation function is 
non-monotonic, and first becomes non-zero near a threshold, Sth = V^{vth — v ). For large a, the rate 
estimation function increases monotonically and becomes linear in s s for large s s . In terms of the relative 
stimulus strength, s s /a, the rate estimation function shows perfect contrast gain control as defined in 
Eq. ((27)) : after scaling out a multiplicative gain factor — the mean firing rate — the threshold function is 
invariant to changes in the input standard deviation. 

Thus, the STA-based LN models describe the encoding of the LIF model as leaky integration followed 
by thresholding; furthermore the encoding is adaptive, with both the STA and the threshold function de- 
pending on the input standard deviation. We will return to the observation of perfect contrast adaptation 
in Section: Context-dependent coding in integrate- and- fire models. 



Derivation of rate estimation functions from the voltage 

Our first goal is to derive the components of the LN model, as sampled numerically from a dynamical 
model, directly from that underlying model. In the special case of single neurons, we can use the fact that 
spikes are defined in terms of voltage state. Previous work has focused on how the filter is determined 
by the dynamics [I1I2TH23, 26 28]. Now, we will first derive expressions for the threshold function of the 
LN model for a generic filter, which we call here the rate estimation function, and then return to the 
question of the appropriate filter. 

The fundamental insight of this work is to recognize that the estimated firing rate produced by an LN 
model can be derived from the dynamics of the voltage, by averaging over all voltage trajectories whose 
evolution is consistent with the occurrence of a spike at time t and the filtered stimulus taking on the 
observed value, s x (t) for filter h x : 

Ra[s x {t)] = [Vv(t')p a [v(t')\s x (t)] R[v(t')\. (1) 

The integral is over all voltage trajectories, weighted by the probability that the trajectory, v(t'), is 
consistent with the value of the filtered stimulus at time t, p a [v(t') \s x (t)\ . The term R[v(t')] represents 
the instantaneous firing rate at time t for a given voltage trajectory; it is non-zero only at events in the 
voltage trajectory that are defined to be spike times. Thus, mapping the dynamics onto the coding model 
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requires (1) specifying the mapping from a voltage trajectory to unique spike times, and (2) understanding 
the relationship between the voltage and the filtered stimulus. 

Rate estimation functions for the LIF model 

For the leaky integrate-and-fire neuron, the transformation from voltage to spikes is simply via threshold- 
crossing, and the rate is given in terms of the voltage by 

R[v(t)] = jU[v(t) - v th ] , (2) 

where dt is the sampling time step, vth is the threshold voltage, and H[. . .] is the Hcaviside step function; 
i.e. for time bins dt, spikes occur with probability one when the voltage is above Vth and zero otherwise; 
see Section: Discretization and regularization for relevant discussion. Then Eq. (JT]) simplifies to 

K[s x (t)} = J dv(t)R[v(t)} Pa [v(t)\s x (t)] . (3) 

Eqs. © and (O combine to give: 

1 f°° 

R*[3x(t)] = — / dv{t)p a [v{t)\s x {tj\ , 

dt J vth 

= j f P*[v{t) > v th \s x (t)] , (4) 

where P a [v(t) > Vth\s(t)] is the probability that the voltage is above threshold given the value of the 
filtered stimulus (throughout this work, we use the convention that probability densities are denoted with 
lowercase p and true probabilities are denoted with uppercase P). 

To proceed, we need to analyze p„ \v{t)\s x {£)\ . The form of Eq. (TJ| suggests that the filtered stimulus 
acts as a linear estimator of the state of the voltage at the time of a spike. While no general closed form 
solution is available for this voltage estimation problem, in Section: Moment-based asymptotic results, 
we derive useful asymptotic results at small and large a. First, however, we discuss the exponential 
integrate-and-fire model and the question of optimal filter choice. 

Rate estimation functions of the EIF model 

While spike times are very clearly specified in terms of the voltage for the leaky integrate-and-fire model, 
this is not generally the case for more realistic neurons; the spike is an extended waveform that is 
stereotyped but always shows some variability. This is true even for the exponential integrate-and-fire 
model: only the peak of the spike is uniquely specified and there is variability in the spike onset. To 
identify an equivalent function relating rate to voltage in this more realistic case, we must identify an 
effective threshold above which the spike waveform is stereotyped. 

For inputs whose standard deviation goes to zero, the EIF model has an intrinsic voltage threshold 
whose crossing times provide the natural definition for the onset time of the spike. The unstable fixed 
point, v t h, is the separatrix between subthreshold and spiking trajectories, known as the dynamical 
threshold [2,22.28 30 . At low input noise, the stimulus brings the system to and across the dynamical 
threshold, above which the spiking trajectory is essentially independent of the stimulus. The spike- 
generating dynamics are controlled by two parameters: the voltage activation scale for the exponential 
current, A, and the after-spike reset voltage, v r (see Eq. l4"2j) . 



7 



For larger input standard deviations, there is a significant probability that trajectories that cross the 
dynamical threshold may cross back without spiking, and so we need a more appropriate definition 
of threshold. To identify LN models that best represent how the input drives spiking, we want to 
identify spike times that separate the role of subthreshold integration from super-threshold, largely input- 
independent dynamics. We define a stochastic dynamical threshold, Vth,a to be the voltage beyond which 
fluctuations in the input that occur immediately following the crossing point have a fixed, low probability 
of aborting the voltage from reaching the spike peak: 



Here C is a number between 0.5 and 1, and the transition probability from time t to t + dt is given in 
Eq. (|5Tj) . While this rule only considers the instantaneous evolution of the voltage at the threshold, the 
model's exponential nonlinearity ensures that future time steps above threshold arc exponentially more 
likely to continue to the top of a spike. So, for an appropriate choice of C (to be discussed shortly), this 
local definition identifies spikes reliably. Integrating Eq. (|5|) gives an implicit equation for the threshold 
as a function of the noise strength: 



There are two roots to this equation; the threshold is the one such that Vth,a > vth- The unstable fixed 
point Vth is recovered for C = 0.5: it is the voltage from which all perturbations have an equal chance 
of either stepping toward the spike peak or toward the stable fixed point in the next time step. For 
corrections to the threshold that are small relative to the activation scale A, the stochastic dynamical 
threshold is 



which is linear in the standard deviation. The position of the stochastic dynamical threshold is controlled 
by the spike-generating dynamics, and is determined by the magnitude of the deterministic excitable 
current needed to compensate for input fluctuations that work to abort a spike. 

As discussed in Section: Discretization and regularization, the v dt^ 1 divergence of the stochastic dy- 
namical threshold arises from the white noise statistics of the input. If the continuous time limit is 
taken without regularization, the derivative of the voltage can be arbitrarily large and so no finite volt- 
age threshold satisfies the criterion in Eq. (0. This expression is sensible for inputs correlated on the 
timcscalc dt, provided dt is small. 

To construct an LN model, we need to specify the confidence, C, that determines the threshold used to 
identify spikes. In a real neuron, a natural definition of threshold corresponds to the peak voltage of 
the minimal event that propagates down the axon, and C should be set accordingly. Here, the choice 
of C is somewhat arbitrary. The threshold should identify event times that correspond to true spikes 
(those that reach v s ) with a low fraction of mislabeled aborted spikes. This criterion implies C — ^ 1, 
corresponding to triggering spikes at the peak voltage. However, for the purposes of constructing a model 
that best characterizes the relationship of spiking to the input, one would like to set this threshold as low 
as possible, in order to minimize the relative effect of the stimulus-independent internal spike-generating 
dynamics. This would imply C — > 0.5, corresponding to identifying spikes at the intrinsic dynamical 
threshold Vth- 

We chose C = 0.95 to balance these competing requirements. For this value, the odds of misclassifying an 
aborted spike as a spike are approximately 1 in 150, and so the false-positive rate is lowQ- We verified this 

x At the temporal discretization we used, and for any finite discretization, the false-positive rate is much lower than 




(5) 




(0) 




(7) 
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procedure by computing how predictive LN models based on different choices of spike identification are, 
using the information criterion in Eq. (|60|) . LN models that predict spikes on crossing the corresponding 
stochastic dynamical threshold are 1 — 10% more informative about spiking than LN models based on 
spike times defined at the peak voltage (not shown), although results are qualitatively insensitive to the 
exact choice of threshold. Sec Fig. [2] for example traces and discussion in context. 
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Figure 2. Stochastic dynamical threshold in the EIF model. Example voltage traces for the 
EIF model; voltage in units of Vth — v o- A, a — 0.45. B, a = 1.5. The dynamical threshold in the 
deterministic limit, Vth = 1 in normalized units, is the solid black line. The 95% confidence threshold 
(C = 0.95) as defined in Eq. ([6]) is shown dashed. There are multiple crossings of the deterministic 
dynamical threshold that do not correspond to spikes, whereas the stochastic threshold reliably 
identifies spikes. The stochastic threshold is adaptive in that it depends on the input standard deviation. 
Some non-spiking events shown in panel B would be labeled spikes using the threshold in panel A. 



Armed with this treatment of spike time identification, we return to the derivation of the rate estimation 
function for the EIF model. Determining spike times based on the crossing of the stochastic dynamical 
threshold requires knowing the voltage at two adjacent times. Eq. (TT|) thus simplifies to: 

R<r[s x (t)} = Jdv(t-dt) Jdv(t)R[v(t-dt),v(t)]p a [v(t-dt),v(t)\s x (t)], (8) 

where the rate based on the voltage is: 

R[v(t - dt), v(t)] = ^[thh.a - v(t - dt)] R[v(t) - vt h A ; (9) 
at 

spike times are labeled with probability one when the voltage goes from below threshold to above threshold 
in a single time step; this is equivalent to the condition that spike times corresponding to crossing a voltage 
threshold with v > at the time of the spike. In Section: Dynamics conditioned on the filtered stimulus, 
we develop Eq. (J8j) to get: 

R a [s x (t)] = ^^p„[v tht(r \ Sx {t)], (10) 
\l2-KTdt 



the 1 in 19 that might be expected because most spiking events do not start from exactly the threshold voltage, but from 
slightly larger voltages, which are more likely to correspond to true spikes. 
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where (3(C) is a constant that depends on the confidence, C, used to define the stochastic threshold and 
is defined in Eq. (|73|) . 

Eq. (fit))) has a simple interpretation analogous to the more intuitive result for the LIF model in Eq. 
(|4]). The estimated firing rate is proportional to the predicted probability density at threshold, and the 
CT-dependent coefficient accounts for the range of voltages near threshold that are accessible in a single 
time step, defining an effective "resolution" of the threshold voltage state that depends on the interaction 
of the input strength and the spike-generating dynamics: 



The origin of this effective threshold resolution is straightforward. First, consider C = 0.5, for which 
(3(0.5) = 1 and Vth,a — v th, corresponding to spike times defined by crossing the intrinsic dynamical 

threshold. At vth, the voltage is described by an unbiased random walk. The resolution, 8v t h = cry 



is the mean voltage change in one time step from vth to voltages above Vth , and this gives the characteristic 
range of accessible voltages during the instant defining the spike time. For C > 0.5, (3{C) < 1, and so 
requiring increased confidence that the labeled spike time corresponds to a true spike decreases the 
characteristic range of voltages at the spike time — spikes are more precisely defined. Using the effective 
resolution, the estimated firing rate in Eq. (flQ|) can be be re-expressed as: 



where P a [v(t)G {vth,a} \ s x (tj\ is the predicted probability of finding the voltage in the threshold-crossing 

Set, {v t h,a} ~ {V t h,a < V(t) < V t h, a + Sv t h,a}- 

Results for the STA-based LN models triggered on the stochastic dynamical threshold are shown in Fig. 
[3] for 0.45 < a < 2. For a larger than shown, the EIF model behavior degenerates to the LIF model 
behavior, and is independent of f(v) provided A <C v s — v t hU The STA of the EIF shows more complex 
adaptive behavior than that of the LIF model. At small standard deviations, the STA is non-monotonic, 
extends for times long compared to the membrane time constant r, and peaks well before the spike. 
For intermediate input standard deviations, the STA becomes approximately exponential except at short 
times, with time constant close to r. The rate estimation functions show that the EIF model is less 
selective to the filtered stimulus than the LIF model, as shown by the wider range of filtered stimulus 
values that cause spiking. 

The optimally predictive filter 

Our expressions for the rate estimation functions of intcgratc-and-firc models, Eqs. ((4]) and (|10[) . hold for 
any choice of filter. However, in order to construct the LN model that is the most precise predictor of 
the spike times of the dynamical model, we need to identify the most relevant filter. This optimal filter 
maximizes the information between the filtered stimulus and a spike: 





(11) 



hopt (t) = arg max l a [sp; s x ] 
Kit) 



(12) 



with the information given by 




(13) 



2 We do not show larger inputs, cr > 2, because they are "unphysiological" in that input-driven fluctuations overwhelm 
the excitatory current, f(v), and approach the amplitude of the spike itself. 
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Figure 3. LN models for the EIF model for 0.45 < a < 2 using the parameters in Eqs. (|40"]) and 
(|4lT) . A . Spike-triggered average in units of ^/^ sp ^ . Color represents a and is consistent with Fig. [TJ 

(T-y/ r I dt 

C. STA on log scale. Solid black is impulse-response filter at rest with time constant r. Dotted is the 
stochastic linearization filter in contrast-invariant regime, k a = 1.9 (see Eq. (|25p). B. Simulation results 
for the rate estimation function based on the STA filter, i? CT [s s ] corresponding to Eq. ([55]); y-axis in Log 
and scaled by mean firing rate; x-axis in units of relative stimulus strength, s s /a. As in the case of the 
LIF model, estimated rate functions are graded thresholds. For intermediate input strength, 
0.8 < a < 2, this choice of EIF parameters leads to LN models that show nearly perfect contrast gain 
control. D. Mean firing rate as a function of input standard deviation. The EIF model parameters for 
contrast gain control were derived by requiring linearity in the mean rate for intermediate input 
strengths, a, and over as large a range in firing rate as possible (see Section: Contrast gain control in 
the EIF model). 



Here H[s x ] is the entropy of the filtered stimulus distribution [31] and if[s K |sp] is the entropy of the spike- 
triggered filtered stimulus distribution; equivalent forms are given in Eqs. (|5"9"|) and ([50)1 . The optimally 
predictive filter is the maximally informative dimension first introduced by Sharpee et al |32j . 

Given the definition of the rate estimation function in Eq. (|10j) , the information per spike is equivalent to 
the information provided by the instantaneous value of the filtered stimulus about the threshold voltage 
state, represented by the likelihood, P a [v(t)£ {vth,a} \ s x (t)]. With an expression for the likelihood, the 
optimal filter can be derived directly from the underlying dynamics. Unfortunately, the complexity of 
the expression for the conditional voltage distribution in Eq. (|73|) prevents us from obtaining general 
analytic results. However, we will proceed to develop intuition to guide derivations in limiting cases. We 
start with general considerations from information theory. 

If nothing is known about the underlying dynamics, one can assume that the STA is the optimal filter. 
In Eq. (|13[) . the stimulus entropy, H[s x ], only depends on the input a and is independent of the shape of 
the filter, and so maximizing the information minimizes the spike-triggered stimulus entropy, _ff[s x |sp]. 
Assuming no knowledge of the dynamics and only that ^[s^lsp] is smooth and bounded, the maximum 
entropy (agnostic) assumption about the spike-triggered distribution is that it is Gaussian and character- 
ized by the variance, Var[s^|sp]. Under the maximum entropy assumption, maximizing the information 
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is equivalent to minimizing the variance of the spike-triggered distribution. The spike-triggered variance 
given a spike at t is: 

Var[s x \sp}= / — / — h x (t-t')h x (t-t")[(l(t'-t)I(t"-t)\s V )-{l(t'-t)\sp){l(t'-t)\sp)]. 
Jo T Jo T 

Since |sp) is positive-definite for any underlying dynamics, the variance is minimized when 

the projection of the filter onto the STA is maximized, and thus the optimal filter is the time-reversed 
STA: 

ho P t(t-t') oc (I(t'-t) | sp>, 

= h s (t-t'), (14) 

where h s is the properly normalized filter. More generally, if the rate estimation function, R&[s x }, is 
invertible, then, even if p[s,,,|sp] is non-Gaussian, the STA is still the optimal filter. This is true because 
invertible transformations from filtered stimulus to estimated rate are information-preserving, and so 
the optimization problem reduces to the Gaussian problem above. This minimax solution provides one 
generic limit from which to study the role of the dynamics in determining the optimal filter. 

A second generic limit occurs when the rate estimation function is non-invertible and is only non-zero for 
filtered stimulus values above some threshold: s x > sth- If there exists a filter such that sth 3> cr, then the 
information per spike from Eq. (|59[) is dominated by the information at the stimulus threshold: 

Rg [fx] 

R(T 



tLN\ / f i -ji-. R<r[ s x] ■, 

(7 [sp;s x (t)}= ds x — e ^—g — log 2 
J Sth V2ira 2 tia 



1 R a [s t h] , 

log 2 



\/2tT Ra 



Rg [Sth] 
Ra 



because the contributions from s x > s t h arc exponentially rare in the Gaussian stimulus ensemble. 
Maximizing the information in this limit is equivalent to maximizing the likelihood ratio of the voltage 
at threshold given the filtered stimulus at threshold; using Eq. ((TU)) one obtains 

h 0p t(t) ->• argmax — . 15 

K m (t) Pa[ve{Vth,<r}} 

In this rare-events limit, the optimal filter best predicts spikes for so P t = Sth, and the quality of the 
predictive power of the filter for so P t > s t h is irrelevant. The optimal filter may differ from the STA 
if the dynamics that lead to spiking for filtered stimuli near threshold are distinctly different from the 
dynamics corresponding to spikes at larger values of the filtered stimulus, as may be the case when larger 
inputs cause significant inter-spike interactions [20ll33l l34] . 



The details of the dynamics will determine which generic limiting optimal filter is closer to the true optimal 
filter. As a proxy for understanding the information maximization problem and to develop intuition, it 
is useful to understand how the voltage and instantaneous firing rate correlate with the filtered stimulus. 
First, we write the input in terms of the dynamics, using Eq. (|42l) : 



i(t) = rv{t) + v(t) -v a + (v th - v r )TR(t), 
= ^(h' 1 * (v - Vo ))(t) + (v th - v r )rR(t), 



where we use the inverse convolution notation as in Eq. (|64[) . In this form, the input current is seen to 
drive the evolution of two correlated parts of the dynamics: the voltage and the instantaneous firing rate. 
The degree of correlation between these terms depends on the timescale. For the shortest timescales of 
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order dt, the voltage and rate are strongly correlated because the dynamics are deterministic and the 
rate is instantaneously non-zero only at spike times. However, since the LIF model is a renewal process 
and inter-spike intervals arc independent, the temporal evolution of voltage decorrclates with the firing 
rate over multiple spikes — correlations between the voltage and firing rate on timescales comparable to 
the mean inter-spike interval are small. This statement is supported by the exponential decay of the 
spikc-triggcrcd voltage (the rate- voltage correlation function), shown in Fig. [9l In terms of the LIF 
model dynamics, the filtered stimulus is: 

s x (t) = (h x * i) (t) = V2 (h x * (h^ 1 * (v - v )) ) (t) + r(v th - v r ){h x * Ft) (t). (16) 

Eq. (|16p is useful because it shows how the value of the filtered stimulus correlates with the dynamics. 
Due to the low-pass filtering performed by h x , the filtered stimulus is only correlated with the dynamics 
on timescales comparable to the filter duration. Given the decaying dynamical correlations on the filter 
timescale, we can approximate the voltage and rate terms as weakly correlated. 

Optimal filter for the LIF model for small a 

For small a, the LIF model operates in the rare-events limit, R a r <C 1, and spikes are only fired in 
response to large deviations of the input current. In this limit, the optimal filter satisfies the likelihood 
optimization criterion in Eq. (|15|) . For small-but-finite a, the optimal filter is the membrane filter, 

hopt = h m , (17) 

where h m is defined in Eq. f|45|) . For the membrane filter, Eq. (fTB]) simplifies to 

S m (t) = V2(v(t) - Vo) + (vth ~ v r )T(h m * R){t). 

Since the term involving the rate is positive-definite, the minimum filtered stimulus that can cause 
spikes — the stimulus threshold — is 

s t h = V2(v t h - v ), (18) 
and all spikes are fired for s m > Sth- For isolated spikes, the term involving R{t) is essentially zero, the 
filtered stimulus is approximately perfectly correlated with the voltage trajectories leading to the spike, 
and so P a [v£ {v t h} | s t h] ~ 1- Thus, for a -C Sth, the membrane filter is the optimally predictive filter. 
The membrane filter is guaranteed to be the optimal filter until a approaches Sth, at which point the 
assumption leading to the likelihood optimization criterion breaks down. 

Furthermore, in this limit, the reset current after a spike is large compared to a and so induces a strong 
relative refractory period. Because of this, the rate estimation function will have a sharp peak at s m = Sth ■ 
Since s m (t) evolves with finite correlation time, values of s m (t) > Sth generally follow recently-fired spikes 
caused by smaller values of the filtered stimulus, and thus larger values of s m cause spikes with reduced 
probability. 

To summarize, for small-but-finite a, one recovers as one should an LN model that is independent of the 
stimulus statistics and that implements the integrate-and-fire computation: the filter is the membrane 
filter and spikes are predominantly fired when the filtered stimulus crosses s t h- See Fig. [TTB and 
accompanying details in Section: Moment-based asymptotic results. In this regime, the computation 
performed by the dynamics is dominated by the evolution of the voltage between spikes and is insensitive 
to inter-spike interactions. However, the STA is no longer optimal for small-but-finitc a. The STA 
is influenced by contributions from above-threshold stimuli that are not relevant for maximizing the 
information. Except when a — > strictly, the STA differs from the membrane filter, as shown in Eq. 
(f8T|) . Accordingly, the STA-bascd models are less informative than the membrane filter models for small 
a, as shown in Fig. 0J3. The STA-based LN models are less precise, with lower stimulus thresholds and 
less selective rate estimation functions, as shown by comparison of panels B and D in Fig. [TJ 
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Figure 4. Predictive power of LN models with various filters for the LIF model. A. 

Coincidence factor denned in Eq. (|57[) versus input standard deviation (log scale). B. Information per 
spike for each LN model relative to the information per spike in the LIF model spike train (see Eqs. 
(|58|) and ([60]) ). Membrane filter (dots); STA filter (squares); stochastic linearization filter (diamonds). 
Fig. A shows the coincidence factors comparing the LIF model spike trains to spike trains produced by 
LN models based on the intrinsic membrane filter, stochastic linearization filter, and the STA filter for 
temporal resolution 7 = r/20. At low input standard deviations, the optimally predictive filter is the 
membrane filter. At large input standard deviation, the STA is the optimal filter. The crossover occurs 
near a = 0.6. The stochastic linearization filter from Eq. (|2Tj) has been included to show how including 
the mean influence of spike history on the filter captures qualitatively the adaptive behavior observed 
empirically. Fig. B shows the fraction of the information captured by the LN models relative to that 
conferred by the LIF model spike train for temporal resolution r/40. Trends match those of the 
coincidence factor, demonstrating that the fraction of information is a good measure of the predictive 
power of the LN models in addition to being a good measure of the completeness of the encoding 
representation. Absolute information per spike in the LIF model ranges from 11.5 bits/sp to 3.2 bits/sp 
for the range of a and temporal resolution shown. At high a, the relative information for the 
STA-based LN model tends toward 1, demonstrating that the LIF model is increasingly well-described 
by the optimal LN model at high firing rates. 



Optimal filter for the LIF model for large a 

For larger a, the firing rate is large, R a T > 1, and inter-spike interactions play a significant role in the 
dynamics. In this limit, the STA is the optimal filter: 

h 0p t = h s . (19) 

This is guaranteed to be approximately true because of the generic minimax argument leading to Eq. 
(Q3J). In Eq. (|2"9")l . we show that the rate estimation function is approximately linearly proportional to 
the filtered stimulus. Thus, the spike-triggered distribution of the filtered stimulus is a skewed Gaussian, 
p[s x |sp] ~ s x p[s x ], and the optimal filter will be close to the Gaussian solution: ho P t ~ h s . For a > 0.6, 
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we find that the STA-based LN models have greater predictive power than models based on the membrane 
filter, with predictive power increasing for increasing a , as shown in Fig. 21 

For a 3> Vth ~ v Q , we argue from the dynamics that the optimal filter is exactly the STA. Examine Eq. 
(|16p to infer how the filtered stimulus is correlated with the dynamics for filtered stimuli that correlate 
with spikes] this question corresponds to considering inputs for which the filtered instantaneous firing rate 
term in Eq. (|T6|) is transiently (on timescales of the filter) non-zero. First, for any filter h x , for strong 
inputs in which the filtered firing rate term is transiently large {[h m *R)^> V^Ra) and thus s x a, the 
value of the filtered stimulus is given primarily by the firing rate term and the voltage term is negligible. 
For transiently high firing rates, the voltage term is approximately fixed at Vth ~^^z~ Vr {<<c_ a): the voltage 
cycles between v r and vth and the contribution from the derivative of the voltage, {h x * w), goes to zero 
since the mean voltage is pinnedH Thus, considering only large values of the filtered stimulus, the STA 
is the optimal filter — it maximizes the correlation with the dynamics of the firing rate on timescales of 
the filter and is not explicitly sensitive to the details of the voltage dynamics. 

Second, consider when the filtered firing rate term trends transiently to zero ({h x * R) < V2R a ) and 
most voltage trajectories do not correspond to spikes. The minimal spiking trajectories are those that, 
on the timescale of the filter, evolve linearly from sufficiently far below threshold to hit vth- From 
the average minimal spiking trajectory assumed to start below rest, v(t) ~ v + (vth — v)h x (t) with 
v ~ (v\v < v ) = —c/Vk, we infer the approximate stimulus threshold for the onset of spiking using Eq. 
(USD: 

s th ~ vJ^, (20) 

as is verified in Fig. [T2R. In the previous paragraph, we observed that the STA is the optimal filter for 
large values of the filtered stimulus, and Eq. (|20p shows that there are no small values correlated with 
spiking. Thus, for a 3> Vth ~ v o, the optimal filter is the STA. Metrics describing the predictive power of 
the STA-based LN model are shown in Fig. @] 



Filter adaptation in the LIF model 

For a < 0.6, the optimal filter is the membrane filter and the optimal LN model is approximately 
independent of the input statistics. For larger a, the STA is the optimal filter, and furthermore, the STA 
adapts with changing a, as shown in Fig. [TJ The LIF model itself is fixed, and so the filter adaptation 
must emerge from the interaction of the input and the nonlinear dynamics. To elucidate the principles of 
filter adaptation, in Section: Adaptation of the optimal filter, we derive the stochastic linearization filter, 
hi, starting from Eq. (|76j) . The stochastic linearization filter is an approximation to the true optimal 
filter and accounts for the mean influence of inter-spike interactions on the optimal filter. While it is 
a sub-optimal predictor of spikes for larger a, it captures the qualitative properties of filter adaptation 
across all a and allows for easy interpretation. 

The stochastic linearization filter is the exponential filter that maximizes the correlation of the voltage 
and the filtered stimulus: 

^H(t), (21) 



with modified time constant: 



1 + (vth - Vr)rR a (v th - (v)) ^ 



3 We also use this argument to derive an asymptotic closed form for the rate estimation function, Eq. H29I I. which is 
verified in Fig. If 2R .C; see also Section: LIF model: large input standard deviation. 
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The dependence of the integration time scale k a on a captures filter adaptation in the LIF model by 
accounting for the perturbation to the linear response by the mean influence of the post-spike current. 
Relative to the membrane time constant (k = 1), the optimal filter time scale changes based on cor- 
relations of the nonlinear after-spike reset current with the voltage. The stochastic linearization filter 
time scale is always shorter than the membrane time constant {k a > 1) because the correction term 
is positive-definite. Intuitively, the filter is shortened because spiking decorrelates the voltage and the 
input, leading the system to "forget" past inputs faster than it would by leak alone. For the LIF model, 
this forgetting is the primary mechanism of filter adaptation, and it is further enhanced in the STA, 
where there is an additional suppression of integration at all but the shortest timescales. More details 
are given in Section: Semi- empirical closed form for the STA of the LIF model. In Fig. |4l we show 
that the stochastic linearization filter is informative across all a and thus quantitatively captures the 
phenomenology of adaptation. 

Optimal filter for the EIF model 

For finite firing rates, the optimal filter for the EIF model with the parameters in Eqs. (|40|) and (|41[) is 
always approximately the STA. This follows immediately from the general minimax argument leading to 
Eq. (|14|) . As shown in Fig. [3l the rate estimation functions are approximately exponential, and so the 
spike-triggered stimulus distribution is approximately Gaussian since it is the product of a Gaussian and 
an exponential, p[s x |sp] ~ e Sx _p[s x ] ~ M . Thus, for all a shown: 

hopt ~ h s . (23) 

As with the LIF model, the STA changes with a due to the interaction of the input and the nonlinear 
dynamics responsible for the spike. To develop intuition, we use the trick established in Eq. (|16|) to write 
the filtered stimulus in terms of the dynamics: 

s x (t) = (h x * i)(t) = V2 (h x * (h- 1 * {v - « )))(i) + (h x * /(«))(*) + T(v th - v r )(h x * R)(t). (24) 

For a <C Vth ~ v o, the firing rate is low and so the rate term can be ignored. However, unlike for the 
LIF model, the optimal filter in the small a limit is not the membrane filter. When the voltage is near 
the threshold state, Vth, the nonlinear spike-generating current is large compared to the input current: 
f(v) « Vth — v Q a. The optimal filtered stimulus must be strongly correlated with the spike-generating 
current. 

In previous treatments of the small-cr limit, the STA was approximated by the most likely spike-triggered 
current [271135] . The most likely spike-triggered current is independent of a and is nonlinear ly determined 
by the passive membrane and the form of f(v). On the approach to a spike, as f(v) increases, less input 
current is required to drive the spike than would be the case for a purely passive membrane. This leads 
to non-monotonic STAs that peak well before the spike time. Furthermore, the excitable nonlincarity 
increases the influence of distant inputs preceding a spike because the nonlinearity reincorporates the 
input through the voltage itself. This extends the duration of the filter to timescales beyond that of the 
passive membrane, Fig. [3]^, [27] . 

For intermediate-strength inputs, a ~ v t n ~ v o and A <g; v t n ~~ v , the optimal filter will be close to 
the membrane filter. We can argue this based on Eq. (|24p . For voltages above threshold, /(i>(i)) and 
R(t) are strongly correlated and largely independent of the input. So, the optimal filtered stimulus 
will be most strongly correlated with the passive membrane dynamics that are primarily responsible 
for the approach to threshold. The STA persists as the approximately optimal filter because the STA 
too becomes approximately exponential in this regime. For the parameters studied, for intermediate <r, 
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voltage fluctuations from the passive membrane integration are comparable to or larger than the changes 
in voltage caused by f(v) until precisely when f(v) dominates and causes a spike, independent of the 
input. The STA thus primarily reflects passive integration approaching threshold. 

Stochastic linearization can again be used to describe how the optimal filter adapts to changing input 
statistics. For the EIF model, the modified inverse time constant is: 

k _ 1 _ ( (v - (v\v < v t h)) f(v)\v < vth) (v t h - v r ) T-Rg (v th - (v\v < vth)) , 25 x 
Var[u|w < Vth] Vax[v\v < v t h] 

as is derived in Section Adaptation of the optimal filter. Eq. (|25|) captures the fundamental adaptive 
behavior of the optimal filter for the EIF model. Again the term involving the mean rate is positive- 
definite and describes the more rapid " forgetting" implemented by the optimal filter due the decorrelation 
of voltage with past inputs due to spiking. The new term involving subthreshold f{v) reflects the influence 
of "active" subthreshold nonlinearity. For small a, it is negative and thus acts to increase the optimal 
filter timescale, Fig. [3J For the parameters simulated in Fig. [31 the subthreshold f(v) term is close to 
zero for a ~ Vth — v and so forgetting dominates and k a is always larger than fco Q 

Context-dependent coding in integrate-and-fire models 

In the previous sections, we demonstrated the principles that allow one to relate the optimal LN model 
to a spiking dynamical model. Now, we switch emphasis. We treat the LN models as the fundamental 
representation of the computation and discuss the properties of the integrate-and-fire models as an imple- 
mentation of the code. In this perspective, the neuron selects from its inputs the component proportional 
to the optimal filter. The size of the relevant signal is proportional to the filtered stimulus, so P t(t), and 
the remainder of the input acts as a background noise with standard deviation proportional to a (see 
Eqs. (|67|) and (|68]) ) . Because the optimal filter and corresponding rate estimation functions depend on 
cr, these very simple dynamical neurons effectively perform context-dependent coding: the signal that the 
neuron encodes and the properties of the encoding into spikes depend on the background in which the 
signal is embedded. 

LIF model for small a: absolute thresholding 

As previously discussed, when the input fluctuations are small, a < 0.6, the LN model encoding repre- 
sents absolute threshold detection using the membrane filter, h m (t): the neuron spikes when the filtered 
stimulus crosses Sth = V^(vth — v ). In this regime, the threshold is always large, s t h 3> o and most 
spikes are independent; and, the optimal filter and rate estimation function are approximately indepen- 
dent of the input standard deviation. This reflects the fundamental deterministic nature of the LIF 
model: when spikes are well-separated, the LIF model performs the integrate-and-fire computation in 
which the dynamics are linear integration to a fixed threshold. 

LIF model for large a: perfect contrast gain control 

In coding terms, the LIF model performs a surprisingly sophisticated computation when the background 
is large. For a ^s> Vth ~ v , the LN models show perfect contrast gain control. Of all possible families 

4 For A > t> t h — v , the EIF model tends toward the quadratic integrate-and-fire model, and the subthreshold f(v) term 
has a strong influence on the optimal filter 1281 . 
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of LN models, those that exhibit contrast gain control have the special property that the spike-triggered 
filtered stimulus distribution is independent of a when expressed in terms of the ratio ^ : 



Pa[s x {t)\sp] -> p 



~s x (t) 




sp 


a 





(26) 



For example, Gaussian p obeys this property as long as (s x |sp) oc a and Var[sa;|sp] oc cr 2 , whereas a simple 
thresholding unit with threshold s t h ^ and p [s|sp] a H(s — s t h), does not0 Comparison with Eq. (f55j) 
shows that the rate estimation function factors into a cr-dependent multiplicative gain given by the mean 
rate times a er-independent function, T(— ) : 



R a [s x (t)]=R a T 



S x {t) 



(27) 



In a system that shows this form of gain control, the thresholding on the filtered stimulus is no longer 
absolute. While the mean (time-averaged) firing rate, Eq. (|47|). depends on the input standard deviation 
or typical scale, the temporal modulation of the firing rate relative to the mean does not, FigJS] One 
may consider this behavior to represent a form of multiplexing where different aspects of the stimulus are 
encoded at different observational timescales [12l[15j[36] : individual spikes transmit context- independent 
information about the presence of a particular feature, while the mean rate averaged over time transmits 
the contextual background information. Previous work has noted aspects of contrast gain control in the 
LIF model [T51I5TH55] . but we believe this is the first report of perfect contrast gain control in the LIF 
model. This result is surprising because the LIF model is so simple — it accomplishes this without any 
nonlinear processes other than the instantaneous spike generating mechanism. 

The observation of perfect contrast gain control for large a is confirmed with asymptotic results in Section: 
LIF model: large input standard deviation. As already discussed, for large cr, the optimally predictive 
filter is given by the STA. The STA is independent of a in the limit: 




5dt 



H 



T 

* + 5 



(28) 



for t < for sufficiently small dt, up to normalization; this is the limiting case of Eq. (f8"Tj) with fcoo given 
in Eq. (|89|) . The limiting rate estimation function for large s x is 



Ro 



-max(/i. 



1 - 



max(ft s ) 



(29) 



from Eq. ([88]); the stimulus threshold scales with a and is approximately s t h ~ 0y ^, Eq. (f20|). consistent 
with spiking trajectories effectively starting from the mean voltage state between spikes. This manifestly 
takes the form required for contrast gain control as shown in Eq. (|27p : the rate estimation function is 
a function of the relative filtered stimulus, s s /er, scaled by the mean firing rate. Detailed results for the 
LN models are shown in Fig. [T2R. 

The absolute scale set by the threshold at Sth = V%{vth — v a ) seen for low a has vanished from the LN 
model. The disappearance of this intrinsic scale is a stochastically emergent phenomenon, established only 
by the interaction of linear integration and instantaneous spike generation. How does this stochastically 
emergent computation come about? 



5 Distributional scaling is denned in terms of the behavior inside an integral. The precise meaning of Eq. J26I) is: 
fdsp[s\ = fd(L) P [l]. For example, zero-mean Gaussian P [s) = -^.e" £ transforms as P [±] = ap[s]. 
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Figure 5. Temporal modulation of the spike times in the LIF model. A. Normalized 
instantaneous firing rate estimate (STA-based), R a [s s (t)] /R a , in the perfect contrast gain control 
regime [a > 4) for a realization of the input current. B. Spike time rasters for a fixed realization of the 
input current presented with increasing a. The spike times of the LIF models correlate with the firing 
rate estimate of the LN model. Event times are reliably predicted by the LN model and are not 
dependent on a. There is a cr-dependent overall increase in the mean firing rate which primarily 
increases the number of spikes per event. 



Principles of perfect contrast gain control in simple models 

In the large a limit, the LIF model shows perfect contrast gain control because it becomes statistically 
invariant to changes in a. The reason for this is evident in the complementary limit where the distances 
between reset and threshold, and resting and threshold, go to zero: the voltage dynamics are linear 
with no fixed scale. All input integration occurs in the half-plane below threshold, and without a fixed 
scale, there is nothing to define a typical size of input for which the model is selective. As shown in Eq. 
(|4"8")l . the steady-state voltage distribution in terms of ^ becomes a half-Gaussian and scales linearly with 
contrast. The contrast invariance is a global phenomenon in the sense that it follows from the statistics 
of the voltage everywhere in the sub-threshold domain and not just at threshold. See Fig. [6]?V,B for a 
depiction. 

By comparing the definition of contrast gain control in Eq. (|27j) with the general form of the rate 
estimation function in terms of the voltage distribution in Eq. (|10[) , we can identify the dynamical 
properties that guarantee perfect contrast gain control for integratc-and-firc models. In general, it must 
be true that: 

^L Pa [v thi<7 \s]=R t7 T(-), (30) 
V2irTdt ' W 

for some function T. For integrate-and-fire models without any additional dynamics, the two properties 

that self-consistently guarantee that Eq. (|30| holds are 



R a oc er, (31) 
-v-v th , a (32) 



Pcr[l>|s] -> p 



The mean rate constraint states that the rate must scale linearly to account for the a-dependent coefficient 
in Eq. (|30[) . The second property states that the distribution of the estimated voltage, p[w|s], must depend 
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Figure 6. Automatic perfect contrast gain control is a generic property of neurons that are 
approximately linear below threshold. All panels: standard deviation a (blue) and 2a (green). A. In a 
neuron that is linear below threshold, voltage trajectories are realizations of the Ornstein-Uhlenbeck 
process between spikes [2]. Starting from the resting potential, typical trajectories wander from rest 
before returning with average displacement proportional to the input standard deviation. The ensemble 
of trajectories can be broken into those that travel below rest before returning (solid) and those that 
travel above (dotted). When the spike-generating mechanism is included and v t h — v a —> 0, all 
trajectories that would extend above threshold arc killed off by the spike, but otherwise the symmetry 
is undisturbed and typical voltage fluctuations, which result from linearly integrating the input, remain 
proportional to the input standard deviation. B. LIF model: steady-state voltage distributions, p a [v] 
(Eq. ((48])), in the perfect contrast gain control regime for v r = v . Outside of the small region between 
rest and threshold, the distributions are half-Gaussian and scale with a, reflecting the subthreshold 
linear behavior required for perfect gain control in simple neurons. Also note the fixed point at p[vth\- 
As Vth ~ v ° 0, the symmetry and scaling become perfect, as predicted by Eq. (|57|) . C. Voltage 
distributions with varying A but fixed leak time constant and v r — v a ; a such that the voltage 
distribution below rest is fixed. When a neuron is passive below rest, in the limit of small vth — v , the 
distributions exhibit perfect contrast gain control regardless of the details of the spike driving currents. 
Kinetics only affect the encoding for voltages above v , where the voltage spends little time. D. EIF 
model steady state distributions in the perfect contrast gain control regime for the optimal parameters 
in Eqs. (|40| and (|4Tj) . Below rest, the distributions scale with a as in panels A and B, again reflecting 
the gain control implemented via subthreshold integration. Above rest, the exponential excitability 
facilitates threshold crossing, shrinking the effective distance to threshold toward zero, and effectively 
expanding the range of actual threshold distances that is compatible with perfect gain control. For the 
optimal parameter set, the excitable dynamics facilitate gain control without otherwise disturbing the 
subthreshold integration, either through decreased sensitivity to the input (faster leak) or stronger 
inter-spike interactions (reset significantly different from rest). 



only on the relative filtered stimulus and the scale 
threshold (not scaled), p[vth,a\s/cr], is independent i 
of the steady-state voltage distribution (Eq. (|46p). 
in terms of v/a only: 

Pc[v\ - 

this is required since s is a linear estimator of v, am 



i voltage, and that the probability of the voltage at 
f a. Averaged over s, Eq. (|32[) implies two properties 
First, the steady-state distribution can be expressed 

(33) 

L(T J 

so the filtered stimulus inherits its scaling properties 
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from the voltage. Second, the probability density at the threshold is independent of er: 

Pa[vth,a] = const. (34) 

which is derived by averaging Eq. (|30[) over s when it holdsjf] The scaling properties of the steady-state 
distribution imply scaling in the shifted moments: 

{(vth,tr - v) n ) = d"/i„, (35) 

where ^„ are constants. For the EIF model, these constraints on the voltage distribution need only hold 
for v < vth,a since the behavior of the voltage during the spike is irrelevant to the coding. 

The constraints that allow for perfect gain control in Eqs. (|3ip . ([34]) . and (|35[) cannot always be satisfied 
simultaneously. We apply our general principles to reproduce the results for the LIF model. First, the 
mean firing rate is linear in a for large a as shown in Eq. (|47[) . Second, when the mean firing rate is 

linear in a, the probability density at threshold is fixed; using Eq. (|52[) . p a [vth] — > — ~ \f^- Third, 
the first shifted moment follows from Eq. f)44f) . Averaging over the input current gives the relationship 
between the mean rate and the mean voltage: 

(v) =v a - (v t h - V r )TR a . (36) 

Plugging that into the first shifted moment in Eq. (|35p constrains the mean firing rate: 

- Hi v th - v 

R a T = a . (37) 

Vth - V r Vth - V r 

This second rate constraint arrived at from the scaling behavior of the voltage distribution is only con- 
sistent with the original rate constraint in Eq. (|31[) when a 3> v t h ~ v a , independent of vth — v r . Thus, 
these constraints recapitulate the observed result: the LIF model shows perfect contrast gain control for 
large a. For mathematical subtleties and treatment of the higher order moments, see Section: Higher 
order moment constraints for contrast gain control. 

The mechanism for gain control is elucidated by Eq. (j3"6")l : the mean voltage is set by the post-spike 
current, driven by the firing rate, averaged by the linear membrane dynamics. This gives the effective 
baseline state of the neuron between spikes, or "operating point", of the model. Moreover, the linear 
subthreshold voltage dynamics guarantee that the mean rate and mean voltage are linearly related for 
all a. Thus, when the mean rate is linear with a and sufficiently large, the typical size of fluctuation 
needed to drive a spike — the distance between threshold and the mean subthreshold voltage — scales with 
the typical input size and thus the response is normalized. 

It is noteworthy that the spike-generating current is responsible for implementing both the slow and fast 
timcscales in the code. The mean rate carries the information about the statistical context, a, while 
individual spikes are strongly correlated with the appearance of the feature defined by the optimal filter 
[T31[T51[2D] ■ This multiplexed code can be generated with a single active current because the membrane 
leak is slow compared to the spike-generating current. The mean state of the neuron between spikes is 
only sensitive to the mean rate, and so there is still freedom for precise modulation of the spike times on 
shorter time scales. 

6 The required invariance of the probability density at threshold to changes in a does not constrain the threshold to take a 
fixed value. Furthermore, since contrast gain control depends on the behavior of the voltage everywhere in the subthreshold 
domain, its existence does not depend strongly on the precise definition of v t ^ a . See Fig. |6p. 
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Contrast gain control in the EIF model 

The scaling properties in Eqs. (f3"Tj) and (j3"2"f are asymptotically guaranteed to hold in the LIF model 
for all parameters. However, this limit is not very biophysically plausible as the required firing rates 
are such that many spikes are typically fired in the integration window of the membrane and the input- 
induced membrane fluctuations are much larger than the intrinsic subthreshold voltage scale. Can the 
more realistic EIF model also show perfect contrast gain control? 

The test of our theory is whether it allows us to predict the code from the dynamics prior to identify- 
ing the code with reverse correlation analysis. Accordingly, we now use the tools we have developed to 
predict parameter regimes for the EIF model that lead to perfect contrast gain control in the LN model 
representation of the encoding. All derived results are based solely on the scaling properties of the mean 
dynamical response given in Eqs. (|31|) and (j35]); we show empirical LN models as validation. 

The limit of small distance to threshold 

In addition to the mean rate constraint in Eq. pip , we show in Section: Moment constraints for contrast 
gain control in the EIF model, Eq. (|92[) . that subthreshold scaling with a in the EIF model implies a 
second linear rate constraint of the form in Eq. (|37[) . Thus, both the gain control constraints arc satisfied 
in the limit where a 3> v t h — v — > 0, with little sensitivity to the spike generating parameters, A and v r . 
This corresponds to the limit discussed previously where there is no relevant fixed scale below threshold; 
see Fig. [6K — C. Unlike in the LIF model, the firing rate is reasonable in this limit because the spike has 
finite duration. 

It is intriguing that the EIF model in this limit shows perfect gain control. The EIF model is equivalent to 
the normal form of a Hodgkin Type I neuron near the bifurcation from excitable to tonic spiking |41j , and 
so qualitatively describes all Type I neurons when they are near bifurcation. Thus, we have discovered a 
generic, model-independent dynamical state that performs stochastically emergent perfect contrast gain 
control without any slow adaptive processes. However, as this generic state is tied to the bifurcation 
point, it is finely-tuned and thus fragile to small changes in the mean of the input. 

As this limit is somewhat trivial, we will now use the constraints to find a range of parameters for the EIF 
model that best implement approximately perfect contrast gain control when the input is not asymptot- 
ically large. 

Perfect contrast gain control with finite distance to threshold 

In Eq. ([9T]) . we derive that the contrast gain control constraints are approximately self-consistently 
satisfied for all a for Vt ft v ~^ lj an d this result is insensitive to the reset voltage, v r . To verify this 
approximate analytic result, we searched over v r and A, holding all other parameters constant, to find 
the models that the constraints predict will show maximally perfect contrast gain control for inputs in the 
range 0.5 < a < 2. We evaluated the deviation of the mean firing rate from best-fit linear prediction. For 
constraint Eq. ([3Tj) . the error, E^, is calculated with respect to the best fit with zero intercept: 




where is the best fit mean firing rate vs. a assuming zero intercept. The error of the constraint from 
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the scaling of the voltage distribution in Eq. (j9"2")l . E^ v \ is: 



E^=M[R a -R^) 2 y (39) 

- ( v \ 

where R& is the best fit mean firing rate vs. a with unconstrained intercept. Perfect contrast gain 
control in the LN models follows from parameter sets in the EIF model for which the above errors are 
minimized, with perfect gain control across all tr occurring in the limit that and E^ both go to 
zero. Results Fig. [7]A.,B support the analytic result that the constraints are best satisfied for A — > I with 
less sensitivity to v r . In Fig. [7p, we show the mean Jenson-Shannon divergence, Eq. (|56]). to directly 
verify for the corresponding LN models that the precision of gain control is predicted by the precision 
of the constraints. The results show that contrast gain control improves as the activation parameter, A, 
gets larger. For — — — > 0.25, the mean divergence is within a factor of two of the minimal value for the 

t> fc> Vth-V ~ ' b 

parameter range shown. 



Optimally selective contrast gain control 



While contrast gain control is more precise as A increases, the precision comes at the cost of decreased 
coding effectiveness: the fraction of information per spike captured by the LN model about the filtered 
stimulus in the dynamical model decreases, Fig. 03. From the perspective that the dynamics are the 
fundamental representation of the computation, the LN model is an increasingly incomplete description 
of how the dynamics encode the total input current. However, from the alternate perspective that the 
LN model is fundamental — that the encoding task is to select one feature as relevant and treat all other 
components of the input as background noise — the EIF model is an increasingly unreliable encoder of 
the filtered stimulus because it is increasingly sensitive to irrelevant background input. 

The decrease in coding selectivitiy with increasing A occurs because the larger nonlinear current, in 
addition to facilitating spiking, also decreases the hypcrpolarizcd leak time constant, tl, as shown in Eq. 
(|43|) . The increased leakiness below the resting potential reduces the role of linear integration in spike 
generation and increases the sensitivity to faster components of the input relative to the STA-filtcrcd 
stimulus. The increase in the subthreshold leak manifests itself as a reduction of the dynamic range of 
the response of the EIF model to the input, which in turn increases the dependence of spiking on faster 
components of the input. For fixed finite a, relative to the response of the LIF model with A = 0, the 
mean firing rate for finite A is reduced: 

R^ Tl 
i?< 0) ~ T ' 

as can be shown from the steady state voltage distribution in Eq. (|46|) via the re-paramctcrization, 
v —> yJ^-v, for v r ~ v a . As discussed previously after Eq. (|36l) . the mean rate controls the mean 
subthreshold voltage, giving: 

(v\v < Vth,a)^ A) _ TL 
(V\V < V th . a ) {0) ~ T ' 

and in the gain control regime, the mean voltage controls all higher order moments: 

{{y - (v\v < v t h,a)) n \v < Vth,cr) {A) _ CJ±\ n 
((v - (v\v < v tKa )) n \v < v th , a )W ~ \~) ' 
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Figure 7. Optimal parameters for contrast gain control in the EIF model. All quantities are 
means taken over 0.5 < a < 2. A. Error, Eq. (|38|) . of the rate constraint in Eq. (|3ip . B. Error, Eq. 
(|39|) , of second rate constraint in Eq. (|92j) . Panels A and B together show that the dynamical 
constraints for perfect gain control are best satisfied for larger A, with weak dependence on v r . C. 
Mean Jenson-Shannon divergence, Eq. (|56[) . measuring similarity of p CT [s|sp] across a; y-axis normalized 
relative to minimum D$j = 0.05 bits. As predicted by the rate constraints, gain control in the LN 
model is best realized for larger A. The precision of the gain control is insensitive to v r . D. Coding 
selectivity: fraction of mutual information per spike captured by the LN model relative to the 
deterministic spike train, as in Fig. at temporal precision r/40. As A increases, the EIF model is 
an increasingly unreliable encoder of the filtered stimulus because of increased sensitivity to additional 
components of the input. E. Optimally selective contrast gain control: mean standard deviation (Eq. 
([39)0 of the rate constraint, normalized by the mean rate dynamic range. Minimum (black dot, also 
panels C,D) is the parameter set that best compromises between demonstrating selective encoding of 
the filtered stimulus and precise contrast gain control, Eqs. (|40| and (|4Tj). Solid blue region shows 
parameters that do not differ significantly from the minimum. See also Fig. [6)3. 

Thus, all subthreshold moments are smaller: changes to a are less relevant to the dynamics for larger 
A and the voltage distribution is more narrowly concentrated with more mass near threshold, reflecting 
increased sensitivity to faster fluctuations in the input. 

If the encoding of the filtered stimulus remained selective to a single feature in this context, increased 
leak would be a viable mechanism for contrast gain control in LN models. As this is not the case, the 
gain control becomes increasingly trivial as A gets larger: while there is invariance, it is with respect to 
a feature to which the system is increasingly insensitive. 

A selective, non-trivial contrast gain control regime is achieved by the EIF model with parameters that 
best compromise between implementing precise gain control while maintaining the selectiveness of the 
encoding by preserving subthreshold integration. The optimal parameter set can be identified by min- 
imizing the error in the rate constraint in ([39]) relative to mean rate dynamic range: — E _„ The 
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optimal parameter set centered at: 

A 

v t h - v 

Vr - Vo 
Vth - V 

as shown in Fig. 03. 

The parameter set in Eqs. (|40|) and ([41]) was used to generate the LN models shown in Fig. [3j Comparison 
with the LIF model for equivalent inputs is shown in Fig. [8j The rate estimation functions show precise 
contrast gain control for 0.8 < a < 2, as opposed to for a > 4 for the LIF model, a special case of the 
EIF model. Furthermore, the mean firing rates are reasonably low: R ~ 0.3/r (see Fig. [3J)). Some 
deviation from perfect contrast-invariant coding appears in the STAs in Fig. [3}U. The long time behavior 
of the filters is exponential and independent of a, agreeing with the prediction from Eq. (|25[) : k a « 1.9. 
However, the filter at short times prior to the spike is sensitive to a. In particular, for a < 0.8, the STA is 
still non-monotonic at short times, showing the influence of the subthreshold nonlinearity in f(v). 



B C 




time 

Figure 8. Contrast gain control comparison: EIF v. LIF models. EIF model (solid), LIF 
model (dashed). A. STA-based rate estimation functions for each model in for 0.8 < a < 2. By design, 
the parameters used in Eqs. (|40|) and (j4Tj) lead to an EIF model that shows good contrast gain control 
for a v t h — v a , whereas the asymptotic gain control in the LIF model (a limiting case of the EIF 
model) is not yet achieved. B, EIF model and C, LIF model; same as in panel A, but for larger input 
range: 0.45 < a < 2. D. STA for a = 1. The filters are almost identical for both models. The changes 
in the coding are entirely due to the differences in the spike-generating currents. 

In contrast to the LIF limit (A — > 0), in the EIF model the spike-generating currents are not perfectly 
time-locked, and so gain control is achieved with a less selective encoding, as shown by the broadened 
rate estimation functions in Fig. [8jA_ and relative information in Fig. 03. The subthreshold nonlinearity 
also introduces a second imperfection in that the STA filter — the selected feature — is not independent 
of a. However, the deviation in the STA is due entirely to the highest frequencies in the input. In the 
perfect gain control regime, LN models built on the stochastic linearization filter with k a = 1.9 capture 
90% of the information captured by the optimal filter at temporal resolution r/40. Furthermore, if the 
noise correlation time is increased to r/8, the filters become essentially pure exponentials and the rate 
estimation functions are qualitatively unchanged (not shown). 



0.25, 
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Discussion 

In this paper, we have addressed a fundamental question in computational ncuroscience: how do the 
dynamical properties of a neuron determine its computational properties? More abstractly, what is the 
relationship between mechanism and meaning? The framework we have used to address this question 
is one in which the input is taken to be random yet fully specified; dynamics are one-dimensional; and 
computation is defined in terms of a " pure" coding model that does not account at all for mechanism, the 
linear/nonlinear model. While we used the linear integrate- and- fire model as a stepping stone to develop 
our approach, our focus was on the exponential integrate-and-fire model. Although this is still a simple 
model, taking it seriously is justified by its success in fitting data from certain neuronal types [3J|2J|B]. 
This work makes two major and several minor novel contributions. First, we derive an expression for 
the nonlinear decision function of the resulting LN model from first principles and estimate its form in 
certain limits. This requires us to revisit the notion of threshold-crossing in nonlinearly excitable models 
and the derivation of appropriate filters for these models, issues which have some surprising subtleties. 
Second, we examine the requirements for a simple dynamical neuronal model to exhibit perfect contrast 
gain control. We find that the LIF model automatically displays this property when the input variance is 
large due to the loss of a typical voltage scale. Based on findings in the LIF model, we are able to show 
that this property also holds for certain parameter regimes in the EIF model. 

The essential concept in our approach is that the filtered stimulus in the LN model is a linear estimator 
of the voltage in the dynamical model. Building from previous work [TJ[3ni[22, 28, 32;, we identified the 
optimal filter as the maximally informative instantaneous linear estimator of the threshold voltage state. 
As has been previously noted, the optimal filter "adapts" , or depends on the stimulus variance, because it 
has to account for the influence of the nonlinear dynamics associated with spiking on input integration in 
the subthreshold regime 1 21) In particular, filter timescales in simple neurons typically grow shorter 
as the mean firing rate of the system increases because spiking leads to forgetting: the stereotyped, input- 
independent spike-generating currents decorrelate the subthreshold voltage with past values of the input. 
By focusing on the linear prediction error, we provided a qualitative recipe to identify the properties of the 
optimal filter as a function of input statistics. Furthermore, by minimizing the mean-square prediction 
error below threshold, we showed how to identify the exponential tail of the optimal filter from first 
principles, accounting for the mean influence of inter-spike interactions on the computation performed on 
the input. For the LIF model, we identified a semi-empirical closed form for the STA by combining our 
novel results for the long-time behavior of the filter with previous results for the short-time behavior of 
the spike-triggered average [2TJ[24]. While we do not yet have a complete derivation, we hope this piece 
of progress on a long-standing question will prove useful. 

Given the filtered stimulus, we showed that the nonlinear rate estimation function is determined by 
the precision of the estimate of the threshold voltage state. For models with continuous dynamics for 
spike generation, there is no unique threshold voltage state. For an example, the EIF model, we pro- 
vided a recipe to identify an appropriate choice of the threshold voltage state — the stochastic dynamical 
threshold — that captures how the choice to assign a unique spike time to a continuous spike waveform 
depends on the spike-generating dynamics and the input statistics. Our definition generalizes the con- 
cept of an intrinsic dynamical threshold found in previous work [2,22,28,41] to better represent spike 
initiation behavior with finite noise strength. LN models are imperfect predictors of the spike train of 
the dynamical model because the linear estimate is necessarily imperfect. 

To study the voltage estimation problem analytically, we introduced the conditional dynamical process 
that is completely specified by p[v(t')\v(t' — dt), s(t)] in Eq. (|70|) . It is a time-inhomogeneous (non- 
stationary) Markov process for fixed t, and the general two-parameter process for flowing t is both non- 
stationary and non-Markov. This turns out to be a specific example of a conditional Markov process, 
first introduced by Stratonovich |42 [ l43 ] . To construct the conditional dynamical process, we studied the 
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input conditioned on observing the filtered stimulus, p[i(f')|s(f)] . The conditional input distribution is a 
precise representation of the concept of feature selection. The input, i(t), is decomposed into an observed 
component, s(t), and a background noise source whose strength is characterized by the input standard 
deviation, a. Because we use the dynamics to identify the relevant component that is observed by the 
neuron, this post hoc separation of source and signal can be thought of as "taking the neuron's perspective" 
where it must automatically separate background from signal without any a priori knowledge. 

In contrast, the recent work on the derivation of LN models from integrate-and-fire models by Ostojic 
and Brunei |44j assumes that the signal is a priori separable from the noise background, as has been 
the case for all previous analytic work on reducing spiking models to rate models known to us (most 
notably also including Plesser et al [45|). Our formulation is equivalent to the approach of Ostojic and 
Brunei if we break the correlation of the signal and the noise in our conditional input distribution, and 
instead take the conditional input distribution to be characterized by the moments (i(t')\s(t')) = s(t') 
and <j>a\ s {t', t") = <r 2 T5(t' — t") in place of Eqs. (|57)) and (JB5J). Under this assumption about the a priori 
independence of signal and noise, the conditional dynamical process remains Markov, which enables those 
authors to use elegant Fokker-Planck methods to derive the filter and the rate estimation function in place 
of our more cumbersome voltage state-space approach. 

Given white noise inputs, we showed that the LIF model shows two distinct computational regimes 
depending on the statistical context. In the small and large background limits, the LN model is predictive 
of the dynamical response and yet the codes are fundamentally different. This is an example of context- 
dependent coding. In the small background limit, the LIF model integrates the input with a fixed filter 
and fires spikes based on an absolute threshold, while for large background, the LIF model uses single 
spikes to encode relative fluctuations of a feature represented by a different filter. Thus, the simplest 
possible intrinsic neural dynamics interact with the input to perform sophisticated adaptive computation. 
Furthermore, if we were to present an input with slowly changing standard deviation, the adaptation of 
the code happens on the timescale of a single inter-spike interval because these models are renewal 
processes. In other words, complete adaptation occurs after a single spike. This work may prove to be 
useful for understanding very rapid adaptive changes observed in biological systems, including the fly HI 
neuron [12] and the retina [T71H8]. 

While previous work has shown gain control in the LIF model }13[[571I55] . this is the first study to show 
the limit of perfect gain control. By studying p[w(i)|s(t)] , we identified the general principles behind 
this adaptive computation and demonstrated that this type of gain control is generic to Type 1 neurons 
whenever the neuron is close to bifurcation. Furthermore, we were able to use the dynamical constraints 
associated with perfect contrast gain control to derive optimal parameters for the EIF model to show 
approximately perfect gain control for finite distance between rest and threshold. To our knowledge, 
this is the first time theory has been used to design from first principles a dynamical neuronal model to 
implement a specific type of code. 

The concepts we used to derive LN models from dynamical models directly generalize to more biophys- 
ically plausible dynamical neuron models. First, the EIF model has been shown to be a useful general 
reduction of more complex biophysical models and can reproduce cortical cell recordings [3113] and so our 
results will be directly applicable when that is the case. However, more generally, neuronal systems are 
described by multidimensional models. Our approach to the voltage estimation problem can be extended 
to include considerations of the states of ion channels. The stochastic dynamical threshold defined sim- 
ilarly will generically be multi-dimensional, and the space of relevant filters may be multidimensional 
as well [2[521[5H]. Explicit spike- history dependence can also be included in the conditional dynamics 
framework as an additional spike-dependent filter, as in generalized linear models |46j . 

The concept of the conditional dynamical process may prove more generally useful for studying the 
dynamical implementation of neural computation. Important goals of research in neural computation are 
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to identify the nervous system's abstract representations of relevant stimuli, internal states and behaviors, 
and to discover the biological mechanisms used to implement and manipulate these abstractions. Here, 
we used the conditional dynamical process to associate an abstract quantity, the filtered stimulus, with 
sub-ensembles of states of a dynamical system, the sets of voltage trajectories that cross threshold and 
are consistent with a given value of the filtered stimulus. This tool provided the bridge that enabled 
us to use methods from cither coding or dynamics to simultaneously study both. While the problem 
studied here is often analytically tractable because the underlying unconditional dynamics are Markov, 
the dimensionality is low, and feature selection is linear, these details are not essential to the concept. 
As similar ideas were first introduced by Stratonovich in the contexts of state estimation and nonlinear 
control [42]143j, we are optimistic that this work will lead to a program to import existing techniques 
from those areas that can be used to study the biophysics of neural coding and to deepen the connections 
between neural coding and control. 

Beyond computational neuroscience, the conditional process derived here provides what appears to be 
a novel approach to studying the accuracy of the stochastic linearization approximation to a nonlinear 
system. Stochastic (or statistical) linearization has primarily found applications in nonlinear control 
and the study of engineered structures subject to random vibrations such as earthquakes and ocean 
waves [47]. An underdeveloped aspect of the theory of stochastic linearization is error analysis of the 
linear approximation [48l|49]. The conditional distribution, p[w(£)|s(f)], is the distribution of the true 
state of the system given the linear approximation, and thus it captures completely the error of the 
approximation locally in state space. This has the potential to be a more precise tool than the mean-square 
error and steady-state distribution-based metrics that are commonly used. Such state-dependent error 
estimation can be important when tolerances or failure modes arc poorly described by global measures 
of error [501151] . We hope that this work may seed further development in this domain. 

Models and Methods 
Integrate-and-fire models 

We work with the integrate-and-fire models, and we restrict ourselves to excitable cases where model 
exhibits a stable rest state in the absence of input. We parameterize the models as: 

tv = -v + v + f(v) - (v s - v r ) rR(t) + i(t), 

/(«) = (v th - «.) (e^ - (l + ^) e^) (l - (l + e^) " , (42) 

R{t) =5(v-v s )vR(v), 

where v a is the resting potential for zero input, v r is the reset voltage immediately after a spike, r is 
the "physiological" membrane time constant near rest, v s is the peak voltage of the spike and the input 
current is in units of voltage (the input resistance is set to one) for convenience. The function f(v) is 
the exponential voltage-activated spike generating current, where A sets the activation scale over which 
the spike-driving excitable current turns on. There is an unstable fixed point at v t u (for v t u > v ) that 
acts as the intrinsic dynamical threshold for pulse-like inputs [2,22,28 30 ,i4T] . This form of f{v) has been 
chosen so that the resting potential, threshold, and membrane time constant at rest are independent of 
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other parameters. The leak time constant, tl, for hyperpolarizcd voltages, v <C v Q — A, is: 

TL=r\l-j^ r —,\, (43) 

for ^« 1 > 
^i(^) for 

We describe the dynamics of the after-spike reset to v r with the deterministic instantaneous firing rate, 
i?(i), given above. The continuous-time, voltage-based reset operation, (v r — v s ) r 5(v — v s )vii(v), is read 
as "when the voltage reaches the spike height v s from below at time t, reset the voltage to v r ." [55] 

Of the model's six parameters, only two are meaningful from the perspective of the geometry of the 
dynamics; the others determine units and the finite cut-off for the spike height (to which the model is 
quite insensitive [52]). We treat A and v r as the meaningful free parameters. 

For finite A, Eq. (|42j) is the exponential integrate-and-fire (EIF) model [6]. In the limit of A = 0, we 
recover the leaky integrate-and-fire (LIF) model. Considered this way, the LIF model is linear below 
vth, and whenever the voltage exceeds vth, it instantaneously jumps to v s and back down to v r (where 
instantaneously means faster than the shortest timescale in the stimulus or subthreshold dynamics). The 
usual definition of the LIF model with f(v) = and v s = Vth is equivalent in probability since the time 
spent in the interval between v t h and v s is zero. In the limit of A — > oo, the model becomes the quadratic 
integrate-and-fire model [53] : we do not attend to this limit in this paper. 

In the figures, all results are displayed in terms of the intrinsic scales, v t h — v Q and r. In LIF simulations, 
we used v r = v Q , and no results presented here qualitatively depend on this choice. For the EIF, in 
simulations, we used the parameters in Eqs. (|40[) and (|41 [) . and v s — v = 20 (v t h — v Q ). 

It is useful to view the model in integral form: 

v(t) = v + [ —e^ [i(t') + f(v(t')) ~ (v a - v r ) TR{t')] , (44) 
Jo T 

ignoring any initial transient. Thus, the voltage is determined by the action of an exponential filter on 
the input current and the nonlinear currents due to spiking at times prior to t. The filter is determined 
by the linear membrane dynamics near rest and so we refer to it as the intrinsic membrane filter: 

h m (t) = V2e-rR[t] , (45) 

where the V2 comes from our normalization convention for filters (/ ^rh(t') 2 = 1) and the Heaviside 
function accounts for causality. 

We examine how a spiking dynamical neuron encodes a realization of a stimulus drawn from a given 
stimulus ensemble. We will work with Gaussian white noise input currents with fixed mean and autocor- 
relation. Such stimuli richly explore the computational properties of the neuron [54] . While the elegant 
mathematical properties of such a stimulus is helpful, it may also be a reasonable simplified model of 
synaptic inputs in cortex )55) . We consider the dynamics to be a deterministic response to a known re- 
alization of white noise. We will consider input ensembles with zero mean and autocorrelation function, 

4>u'- 

<j>u(t - = (i(W)) - (i(t))(i(t')) = a 2 rd(t - t>) , 



where a characterizes the typical scale of fluctuations. 
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For integrate-and-fire models driven by white noise, the steady state voltage distribution and the mean 
firing rate can be calculated directly from the Fokker-Planck formulation of the model i56 . Previous 
work [B1I3H] has shown that: 

pM = — — e ^ / dv e 53 . ( 46 ) 

where F(v) = Jdv f(v) and the mean rate, R a is found from the normalization condition: Jdvp a [v] = 
1. 

For the LIF model, in the limit where the input standard deviation is large compared to v t h — v and 
t>th — IV, the mean rate is: 

Ra^oo = — —, r— , (47) 

V^^th - V r )T 

and the steady state voltage distribution is: 

v 2 -v 2 

-4=- if v r < v < v th , 

as can be derived by manipulating Eq. (|46|) with J^ +e f(y)dy ~ tf(x) in mind. 



Discretization and regularization 

In this work, we use both continuous and discrete time notations, depending on which is more natural. 
All continuous time expressions should be interpreted in the Ito sense, and all discrete time expressions 
are given by the corresponding Eulcr- forward discretization of the continuous system |56j . Explicitly, in 
discrete time, the models become: 

, s_ ( v(t-dt) + f[-v(t-dt)+v + f(v(t-dt))+i(t-dt)] if v(t - dt) < v s , . . 

V{ ' ~\ V r + f [~V r + Vo + f(v r ) + i(t - dt)] if V(t - dt) > V s , ( ' 

where we have defined the reset to be non-anticipating. In this discretization, spikes are reset in the EIF 
model whenever v(t) > v s , and spikes occur in the LIF model whenever v(t) > vth (since again, for A = 0, 
any voltage above vth is instantaneously sent above v s ). For clarity of exposition, we find it natural to 
describe the input in terms of the physical current i(t). To preserve the white noise statistics of the input, 
the discretization of the current is: i(t) = Vy^S.i't), where £(t) is discrete Brownian motion with zero 
mean and unit variance. The input current strength is thus a function of the discretization time scale, 
and, in the continuous limit, diverges. This is the usual pathology that often arises when taking white 
noise seriously in a physical setting. 

The instantaneous discontinuity at threshold in the LIF model causes an additional pathology in the 
continuous limit: the spike-triggered average input (STA) current (Eq. (|53|) ). is a singular function of 
dt and diverges (2TJ[24] . For finite dt, this manifests as a boundary-crossing contribution to the STA 
that exists at short times prior to the spike and is proportional to a in amplitude [20][2TJ[24] . We study 
the STA in more detail in the Section: Semi- empirical closed form for the STA of the LIF model. This 
boundary effect also exists in the EIF model, but it is generally negligible because the approach to the 
reset at short times is dominated by the intrinsic dynamics of the exponential current and the input plays 
essentially no role. 

A simpler divergence also exists in the STA of the EIF model when triggered on the stochastic dy- 
namical threshold defined in Eq. ([6]). With true white noise inputs, the condition that threshold is 
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crossed from below biases the average input current in the last sample preceding the spike to be posi- 
tive. This leads to a "delta- function" component in the STA immediately preceding the spike 20,22,28 . 
This threshold-crossing component only has support over one time bin and appears generically in any 
continuous dynamical model driven by white noise. 

In a physical neuron, the pathologies associated with true white noise are not relevant because inputs 
have finite correlation time and true spiking mechanisms are not infinitely fast. Thus, throughout this 
work, there is an implicit regularization: dt should be thought of as the small-but-finite correlation time 
of the input. In expressions that are sensitive to this regularization, dt appears explicitly; otherwise, dt 
naturally drops out in the limit. All time series are displayed with resolution dt = at larger variances, 
the simulations were run with smaller hidden time step and downsampled for presentation. 



The discrete time model can be expressed as a transition probability. Eq. (|49| is equivalent to the 
transition probability density: 



p[v(t)\v(t - dt),i(t - dt)] = 



dt 



v(t) - v(t -dt) [-v(t - dt) + v + f(v(t - dt)) + i(t - dt)] 



H[v th -v(t- dt)] (50) 



+ 5 



v(t) - v r - — [—v r + v + f (v r ) + i(t - dt)] 

T 



B[v(t-dt)-v th ]. 



Marginalizing over the input current gives: 



r u\\ n jj.M R[vth - v(t - dt)] -A ^ >- 

p[v[t)\v(t — at)] = — , -e 



2ira 2 f 
R[v(t - dt) - v th ] 



(«W-»r- ^[-«r+^o + /(»r)]) 2 



'27ra 2 f 

which describes the dynamics of the model if we do not observe the input current at time t — dt. 



(51) 



For the LIF model in discrete time, to order y/dt/r, the steady state voltage distribution in discrete 
time is unchanged from the continuous-time result in Eq. (|46[) except for voltages near vth- The leading 

correction to the steady state distribution for voltages near threshold, v ~ v t h zta^J^-, can be derived by 
propagating the continuous time steady-state distribution forward one time step near threshold for the 
typical range of voltages that can be spanned in one time step: 



P*[v\v > v th ] 



R a dt 



dv'p[v\v']p a [v], 



("~"th) 
2„2 dt 



7V(T 



2 d* 
2t 



(52) 



Identifying LN models with reverse correlation 



All analysis based on simulation data used reverse correlation to find the linear-nonlinear models corre- 
sponding to each stimulus condition [5"lll5"TH5T)] . The standard choice for the filter is the spike-triggered 
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average current (STA). To find the STA, we average the input current preceding each spike: 

1 N 

(i(t)\sp) = -Y J i(t-U), (53) 

i=l 

where the {U} are the times of the spike. For the LIF model, spikes times correspond to the instants 
when the voltage exceeds vth- For the EIF model, we identify spike times as the instants for which the 
voltage crosses the stochastic dynamical threshold defined in Eq. (J5]) from below. 

Given a choice of filter, h x (where the subscript provides a label in context), we construct the filtered 
stimulus, s x (t), by convolving the input current with the filter: 

tit' 

s x (t)= —h x (t-t')i(t') = (h x *i)(t). (54) 
Jo T 

Our filters are causal and so take the form of a continuous function multiplied by the Heaviside step 
function, H(i — t'). Consistency with Ito calculus requires H(0) = [56] . We normalize filters such that 
J^rh(t') 2 = 1. With this choice, the variance of the filtered stimulus is always: (s x (t) ) = a . 

With the filtered stimulus, the distribution of filtered stimuli given a spike, 

Pa[s x (t)\sp] , 

can be sampled with reverse correlation |57j . The rate estimation function for a particular filter, R a [s x (t)] , 
can then be found via Bayes rule: 

R r , A] _ Pg[sp\s x {t)} 
Ra[Sx{t)\ = J t 

_ „ ivMOIsp] 

— -ttcr r /A, J (.^OJ 

Pa[s x {t)\ 

where the prior distribution of stimuli, Pa[s x (t)] is always Gaussian with mean zero and variance a 2 . By 
choosing the normalization of the filter as described above, all changes in gain appear in the shape of 
Ra[sx(t)]- The filter subscript emphasizes that the threshold function and the spike-triggered distribution 
generally depend strongly on the input standard deviation and the choice of filter. Along with the changes 
in the optimally predictive filter, this dependence is a form of adaptive coding that we will elucidate 
here. 



Quantifying the precision of contrast gain control 



In Fig. [TJC, we quantify the precision of the contrast gain control with the Jenson-Shannon divergence [31] 
of the spike-triggered stimulus distribution at a in terms of z = ^- relative to the distribution at a = 1, 
averaged over a, 



D 



SJ 



(iz/pcr^lspjlogj 



Pa[z\sp] 



la- [z\sp] 



+ pi[z|sp] log 2 



mcr[z|sp] 



(56) 



where 7?v[z|sp] = i (p CT [z|sp] +pi[z|sp]). When D$j — > 0, contrast gain control is perfect — the scaling 
relations in Eqs. ([55]) and (|27|) hold for all a. Results are insensitive to this choice of metric. 
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Coincidence factor 

With respect to the encoding used in the decision to fire a spike, the best LN model description of the 
dynamical model optimally predicts the spikes of the deterministic system. To measure the predictive 
power directly, we calculate the coincidence factor from artificial spike trains generated from an inho- 
mogeneous Poisson process with the instantaneous rate given by the LN model. The coincidence factor 
is: 

p coinc {-^ coinc) j\f—l (57) 

5 {Ndata + N mo del) 

where N CO i nc is the number of spikes that coincide within a tolerance ±7, (N CO i nc ) = 2R a ^Ndata is 
the expected number of coincidences for a Poisson spike train with the same rate as the data, and 
Af^ 1 = 1 — 2R a j is a normalization factor [26 . The coincidence factor is zero for random Poisson 
coincidence and is one for spike trains that agree exactly. 



Information analysis 



When the firing rate is the relevant output, the coding efficiency of a neuron can be quantified by 
calculating the information transmitted about the input current by the observation of a spike. As shown 
in references 1,60,61 , the information per spike in bits is 



I[sp;i(t)] 



' dt R(t) 
T 7 ~ R~ 



log 2 



R 



(58) 



where T is the duration of the spike train. For each input standard deviation, Eq. (J58J) can be applied 
directly to the output of the dynamical models when the instantaneous firing rate is sampled with bins 
of finite duration, giving the information I^[sp; i(t)]. 

For the LN models, the information per spike transmitted about the filtered stimulus can be calculated 
similarly: 

~Ro-[Sx] 



[sp;sz(*)] 



, r ,R<t[Sx\ 

ds x p a [s x \—^ — log 2 



Ra 



ds x p a [s x \sp] log 2 



R<J 

pA s x\M 

Po-[s x ] 



(59) 
(60) 



where the time average has been replaced by an ensemble average over the filtered stimulus; the second 
equality uses the definition of the rate estimation function from reverse correlation in Eq. (|55|) . LN models 
are reduced descriptions of the dynamics and so, by the data-processing inequality, I LN < I D . 

The second equality in Eq. (|60[) takes the decoding perspective: how much information does the spiking 
response provide about the stimulus [S3]. From the decoding perspective, the ratio of the LN model 
information to the dynamical model information provides a measure of the completeness of the LN model 
as a decoding model. When the ratio is close to one, the dynamical model can be thought of as an 
implementation of the code based on linear feature selection. For a decoding showing perfect contrast 
gain control, the information per spike is independent of cr, as is easily shown by changing variables to 
s x /a in Eq. 



Filtered stimulus-conditional input ensembles 

In this section, we summarize the necessary mathematical machinery to work with arbitrary filtered 
stimuli in the context of driven dynamical systems. In Eq. (|54p . we define the filtered stimulus as 
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convolution with a filter h x (t — t'), and so filtered stimuli are Gaussian processes [55]. Throughout this 
paper, we assume that inputs are statistically in steady state (4 > 0). To calculate LN models from first 
principles, we need to know the statistics of the white noise input when conditioned on the observation of 
a particular value of the filtered input. We will first characterize the conditional mean and autocorrelation 
functions for the input current, (i(t')\s x (t)} and (j>u\ S:c (t', t"; t). 

Given a filtered observation s x (t), one can invert (deconvolve) the filtering to reconstruct the original 
input current, i(t). In continuous time, the inverse convolution operator, h~ l (t — <'), is the differential 
operator defined by: 



and so performs the operation: 



* dt' » 

— h- 1 (t-t')h x (t')=rS(t-t') (61) 



/"* dt' - 

i(t)= / —h-\t-H)a x {H). (62) 



For causal filters whose inverse operator can be expressed as a power series in the left and right 
inverses are equal: 

dt' - t* dt' 

h- 1 {t-t')h{t')= — hfflh-^t - 1') (63) 



T 



as can be verified by integrating by parts over a test function. For an arbitrary filter, one can construct 
the inverse explicitly |63 | I64 | , but here it is sufficient to work only with the special case of an exponential 
filter. For normalized exponential filters with dimcnsionless inverse time constant k, 



h(t - t') = V2ke- hi ^R(t - t'), 



the inverse convolution operator is 

h-\t-t') = ^k + r^r5it-t% (64) 
as can be verified by substitution into Eq. (|6ip . 

From Eq. (|62p . we can get the properties of i(t') given s x (t) once we have the conditional mean and 
autocorrelation of the filtered stimulus. The mean conditional moment is most easily inferred directly from 
the well-known autocorrelation function of a Gaussian process [62] , using the Law of Total Expectation 
identity, (s x (t')s x (t)) = {(s x (t')\s x (t))s x {t)) = (/> SxSx (f, t"), giving: 

/•min(t,i') ^ 

{s x (t')\s x (t)) = s x (t) —h^t-tjh^t'-h). (65) 

Jo T 

Similarly, the conditional autocorrelation function, is most easily inferred from the Law of Total Covari- 
ance identity, (*',*";*) = ^ Sx (t',t") - ((s x (t')\s x (t))(s x (t")\s x (t)}), and is: 



c \s*(t',t";t) = a 2 



rmm(t',t") j, 
Jo T 



(66) 



rmin(t,t') ^, <.min(i,t") ^. 

/ h x (t - ti)h x (t' - h) —h x {t - h)h x (t" - ti) 

Jo T JO T 
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Note that the conditional autocorrelation does not depend on the value of s x (t). Higher-order correlation 
functions may be constructed from the first two with the usual conditional Gaussian closure relations 
[55]. 

Using the moments of the filtered stimulus, one can now compute the properties of the original white 
noise input ensemble conditioned on observing a particular value of the filtered stimulus. Using Eqs. (|62[) 
and (|65|) . the conditional mean current is 

(*(0I«*(*)>= f — ^" 1 (t , -*i)<«x(t l )|**(t)>, 
Jo T 

= s x (t)h x (t-t'). (67) 

Since the filter is causal, (i(t')\s x (t)) is only non-zero for t' < t. Similarly, we can find the conditional 
autocorrelation function with Eq. (|6"6"|) : 

(*', t"; t) = o 2 [r5{t' - t") - h x (t - t')h x (t - t")} . (68) 

Eqs. (|67[) and (|68[) define an s x -dependent Gaussian source with discrete time conditional distribu- 
tion: 

(i(t')- 3a: (t)h x (t-t')) 2 

p[i(t')\s x (t)} = . ^(i-^-o^) (6Q) 

These expressions capture the bias in the mean and the reduction in variance due to selecting out the 
component proportional to h x (t) and leaving all other components unobserved. 



Dynamics conditioned on the filtered stimulus 

While the dynamics are deterministic because the realization of the input is assumed to be known, in 
transitioning to the LN model framework, stochasticity is introduced by throwing away the influence of 
all history not captured by the filter. The stochastic dynamical process that is statistically equivalent to 
the LN model with a given filter, h x , is the one that describes the evolution of the voltage preceding an 
observed target value of the filtered stimulus. This process evolves according to a conditional transition 
probability, p[v(t')\v(t' — dt),s x (t)\. We develop this section for the EIF model. LIF-specific results 
follow in the A-)0 limit. 

We find the conditional transition probability by marginalizing over the input current given s x (t) : 
p[v(t')\v(t' - dt),s x (t)] = j di(t' - dt)p[v(t')\v(t' - dt),i(t' - dt)]p[i(t' - dt)\s x (t)], 



where p[i(f - dt)\s x (t)] is given by Eq. {69]) and p[v(t')\v(t' - dt), i(t' - dt)] is given in Eq. ([50]). The 
conditional transition probability of the process is 

p[v(t')\v(t' -dt),s x (t)] = 

(^v(t')-Z,(t'-dt)-dl[- V (t'-dt)+ Vo +f(z,(t'-dt))+B m (t)h x (t-t'+dt)]Y 

R[V S - V(t - dt)\ ^ 2.2ii( 1 _ h;c(t _ t / +dt) 2^) 

2na 2 f (l-h x (t-t' + dt) 2 f) 

(v(t')-v r - ii±[-„ T .+„ +/( UT .)+ Sx (t)h x (t-t'+dt)]) 2 

R[v(t -dt)-V s \ ^ 2<T 2M^- h:c{t - t/ + dt) 2M) , 7Q s 



27ra2f (1- h x (t-f + dtff) 
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Eq. (|70[) in principle gives complete information about the evolution of the ensemble of voltage trajectories 
given an observed value of the filtered stimulus. 

To compute the rate estimation function in Eq. ©, we need p[v(t), v(t — dt)\s x (t)] . We factor this into 
two parts: 

p[v(t),v(t-dt)\s x (t)] = p[v(t - dt)\v(t),s x (t)]p[v(t)\s x (t)]. 

The first term is the backward conditional transition probability. Because of the reset, there are two ways 
v(t) can be arrived at from v(t — dt): directly from nearby voltages or indirectly via the instantaneous 
reset. The backward conditional transition probability is thus of the form 

p W[v(t' dt)\v(t'),s x (t)] +pW[v(t' - dt)\v(t>) > v s , Sx (t)]pW[v r \v(t'), Sx (t)] 
p[v(t -dt)\v(t),a x (t)\ - ! N[v(t')} ' 

(71) 

where Af[v(t')] is the normalization constant such that Jdv(t' — dt)p[v(t' — dt)\v(t'), s x (t)\ = 1, and 
p^[v(f — dt)\v(t'), s x (t)] is the free backward conditional transition probability ignoring the reset. The 
forward evolution of the free system during a single time step is statistically reversible because it is 
Gaussian |66| , so the free backward conditional transition probability is the time- reversal of the forward 

(„(*' -dt)-u(t')- di[- t .(t') + i. + /("(f ')) +=* (tjhxlt-t' + dt)]) 2 

p<°> [v(t'~ dt)\v(t'), s x (t)] oce 2 „2M^ hx(t _ t , +dt) 2M) ~ (?2) 

In the domain of integration required for the rate estimation function, v(t) > v t h.a, the second term in 
Eq. (|7Tj) is always negligible when Vth,a — v r ^> cry dt/r. 

The voltage estimation distribution, p[v(i)|sa;(i)] , can formally be found by repeated application of the 
the forward conditional transition probability evolved from the steady-state voltage distribution in the 
distant past: 

p[v(t)\s x (t)] = [ dv(t-dt)... [ dv{0) P [v{t)\v{t-dt),s x (t)] ...p[v{dt)\v(0),s x (t)]p[v(0)], (73) 



where p[u(0)] is the steady-state distribution in Eq. (|46l) . With Eqs. ((HJ and (|9|), we have a formal 
solution in terms of integrals for the rate estimation function given a choice of the filter. In later sections, 
we demonstrate some asymptotic results based on moments of p[w(i)|sa;(t)] . 

To simplify the expression for the estimated firing rate in Eq. ((8]), we perform the integral over v(t — dt) 
using Eq. ((TTj) to obtain 

T> r M , 1 r, M l A - Vth, a + f[~v{t) +Vg + f (V(t)) + S x (t)h x (dt)] \ 

Ra[s x (t)} = — / dv(t) -eyfc \ , p[v(t)\s x (t)\, 

dtJ ^ 2 V ^f(i-h x (dt)2f) ) 

where we assume that Vth,a *C v s . Because the variance in the complementary error function goes to zero 
with dt, the error function only has very narrow support near vth,a m the domain of integration, and so 
we can replace the error function with a delta function at Vth,a under the integral: 

RvMt)] « - f dv(t)S[v(t)-V th ,a]p[v{t)\s x {t)]. 
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The amplitude of the delta-function, A, is determined by the integral of the complementary error function 
above threshold. Using the definition of the stochastic dynamical threshold in Eq. ^ and keeping only 
the leading order in y/dt/r, 

1 / w(t) - v tKa + $[-v(t) + v + f (v(t)) + s x (t)h x (dt)} 



A = J dv(t) ierfc | 



2a 2 f {l-h x {dty f) 



dv (t) I erfc ( "Wj^ + err i (2C _ 1} 
2 \ a /2ct 2 ^ 



/3(C) 



cr 2 <it 



2ttt ' 
where: 

/3(C) = e-H^c-i))- + 2 (C - l)V^crF 1 (2C - 1) . (74) 

/3(C) is 0(1 - C) for 0.05 < C < 0.95. The final result for rate estimation function for the EIF model 
is given in Eq. (fT0|) . As discussed near Eq. (fTOj) . the coefficient ^4 can be interpreted as defining the 
effective size of the set of voltages that correspond to a spike time. 

The LIF model result presented in Eq. Q also follows from this more general development from the 
EIF model when A — > 0. In the limit, the stochastic dynamical threshold, v t h.a goes to the deterministic 
threshold, v t h, for all a. Similarly, the defining equation of the LN model of the EIF model in Eq. ([8]) 
reduces to the equation for the LIF model in Eq. ([3]) because the integral over dv(t — dt) is always one 
in the LIF limit. 



Adaptation of the optimal filter 

To see how the filtered stimulus acts as a linear estimator of the voltage, we use the "inverse" of the 
trick used in Eq. (|16[) to put the filtered stimulus into the dynamical system. We formally integrate 
the dynamical model, as shown in Eq. (|44j). and add and subtract from the right side a(/i~ 1 * s x )(t) — 
a y l ri% * s x) (i)> where /i" 1 is the inverse convolution operator for the membrane filter, from Eq. (|64p . and 
a is a normalization constant. Then, 

v(t) =v a + as x (t) + e{t), (75) 
pt dt' 



( t )= / ^e^ 1 i(t>) + f(v(t')) -{vs-v^TR^-a^ 1 *s x ){t') . (76) 



Eqs. (|75|) and (jTB"]) are exact, but are in the suggestive form of a linear instantaneous estimate of the 
voltage, v a + as x (t), plus an "error term," e(t), that depends on all of the history of the stimulus and 
the nonlinear currents up to time t. The dimensionality reduction and corresponding loss of determinism 
going from the nonlinear spiking system to the LN model is the result of throwing away all history that 
cannot be accounted for by the instantaneous value of s x (t). The optimally predictive filter will "explain" 
as much of the error term as possible when the voltage is at threshold, as represented by the uncertainty 
reduction due to maximizing the information in Eq. (|59[) . 



While we cannot analytically solve the information optimization problem in Eq. (|12[) outside of the limits 
presented in the body text, we can gain insight by examining the filter that approximately minimizes the 
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variance of e(i), given that the voltage is below threshold. By definition, minimizing the variance of the 
error term is equivalent to maximizing the correlation of the voltage and the filtered stimulus since 

Vax[e(t)\v(t) < v t , ha ] = Var[v(i) - as x (t)\v(t) < v thi0 ] , 

= Vax[v(t)\v(t) < v t h,a\ + Var [as x (*)| v(t) < v th ^] - Gov[v(t)as x (t)\v(t) < v th>a ] , 

and a is chosen so that Var[as x (t)\v(t) < vth.a] = Vax[v(t)\v(t) < vth.cr]- We find the filter that mini- 
mizes the variance of the error term by using a self-consistent approximation called stochastic linearization 
that we have treated previously [25]. We denote the stochastic linearization filter hi(t), and it solves the 
optimization problem: 

hi(t) = argmax Var[e(i)|v(i) < v t h.<j\ , (77) 

M*) 

under some assumptions. 



Stochastic linearization filter for the LIF model 

We start with the LIF model because all of its dynamical evolution takes place below threshold and so 
Var[e(£)|i;(f) < Vth,a] = Var[e(i)]. For one-dimensional dynamical models, the stochastic linearization 
approximation looks for the most predictive exponential filter with a er-dependent time constant. We 
expect there is a good single exponential approximation to the optimal filter for the following reasons. 
The dynamical models are stochastic processes, and all correlation functions can in principle be derived 
from the corresponding Fokker-Planck equation. As the Fokker-Planck equation is linear, it can be 
decomposed into eigenmodes that are a function of the input standard deviation, and all statistical 
properties follow from the eigenmodes [56]. For example, steady-state statistics follow from the eigenmode 
with zero eigenvalue. For excitable one-dimensional models, the eigenvalues are real and the spectrum 
is discrete [68]. Thus, low-order statistics have long time behavior dominated by the lowest non-zero 
eigenvalue, and the optimal filter must account for those statistics. So, on general grounds, we expect 
the optimal filter to have exponential long-time behavior. 

We search for an exponential filter with dimensionless inverse time constant k a . For an exponential 
filter, we can make a substitution using an identity that follows immediately from Eqs. (|€>2|) and (|64[) : 
i(t) = Tcts v + k a as v , where a is a normalization constant. The variance of the error term, Eq. (|76[) . 
is: 



Var[e(i)] 



T 



^ dt" t' +t" -2t 
e 

T 



{k a - If a 2 (s v {t') Sv {t")) + (V th - V r f T 2 ^ RR {t' - t") 



2(fc CT - 1) {Vth ~ Vr) ™ (R(t')s v {t")) 



(78) 



where 4>RR,(t' — t") = ((R(t')R(t")) — R 2 ) is the rate autocorrelation function. Unfortunately, the corre- 
lation functions involving the rate are unknown. 

In stochastic linearization, this closure problem is solved by assuming that the variance of the error term 
is identically zero. The approximation proceeds in two steps. First, since Var[e(i)] = Var[w(i) — as v (t)], 
under the assumption that the variance of the error term is identically zero, it follows that as v (t) = 
v(t) — (v). Second, one assumes that all the correlation functions that appear in Eq. (|75|) have the same 
time dependence, which we denote g(t' — £"); this assumption is good when the correlation functions are 
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dominated by the lowest non-zero eigenmode discussed previously. Under these assumptions, 

a 2 (s v (t')s v (t")} = {(v 2 ) - (v) 2 ) g(t' -t"), 



a 



(R(t')s v (t")) = ((Rv) - R a (v)) g(t' - t") 



<>RR 



(t'-t") = ((R 2 )-Rl)g(t'-t") 



The rate- voltage cross correlation is (Rv) = R a (v |sp) = R a vth- In this approximation, the variance of 
the error term is 



Var[e(i)] oc 



(k a - l) 2 ((v 2 ) - (v) 2 ) - 2{k a - 1) (v th - v r ) rR a (v th - (v)) 
+ (v th -v r ) 2 r 2 ((R 2 )-Rl) 



dt f dt t'+ t "-2t 

— / — e - g{t - t ). (79) 

T I T 



Minimization with respect to gives the result in Eq. (j2"2")l . 



For one-dimensional models, the stochastic linearization filter differs from the exponential membrane 
filter. For the LIF model, it is important to realize that the membrane filter, corresponding to k = 1, 
is not in the eigenspectrum for finite a because of the reset nonlincarity (or equivalcntly, because of 
the non-natural boundary conditions in the Fokker-Planck formulation) and thus does not appear in the 
relevant correlation functions. This explains why the membrane filter is sub-optimal for predicting the 
response for larger input standard deviations. 

As anticipated by the eigenmode argument, the long time behavior of the low-order correlation functions 
is given by the timescale defined in Eq. (|22[) as can be seen from the spike-triggered average current, the 
spike-triggered voltage, and the rate autocorrelation function. This is verified in Fig. |9] 



The reader surely noticed some sleight of hand in the minimization of Eq. (|79j) . The filtered stimulus s v 

is an OU-process, and so g(t' — t") ~ e~^~^ '. Why do we not minimize over the k a dependence of the 
integrated g(t' — t")l It is incorrect to vary over the fc CT -dependence in g because doing so introduces a 
purely multiplicative degree of freedom into the minimization of Var[e(i)]. This is equivalent to assuming 
that the variance of the voltage is undetermined prior to minimization. In lieu of introducing a Lagrange 
multiplier to constrain the multiplicative scaling, we held the degree of freedom fixed. Discussion of this 
issue and related subtleties has been extensive in the stochastic linearization literature. References [69H7T] 
are among the more accessible; our approach is 'true' linearization as described therein. 



Semi-empirical closed form for the STA of the LIF model 

At short times prior to the spike for small time steps dt, Paninski has shown [21] (see also Badel et 
al. |24j ) that the STA has a square- root singularity at short times that, in our notation, is 



2 f dt 



1/2 

(i(t)\sp) = ad - [ — \ for-T<-f<0. 

This component of the STA is entirely determined by the threshold boundary and is independent of the 
subthreshold dynamics. For long times prior to the spike, the stochastic linearization technique describes 
the noise- modified effective linear dynamics, an OU-process: tv = — k a (v — (v)) + I(t). If there was 
no boundary effect, the STA of the effective linear dynamics (the average input current conditioned on 
hitting vth) would be: 

(i(t)\sp) = 2 {v th ~ (v)) for t < -r, 
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Figure 9. Stochastic linearization predicts the lowest non-zero eigenmode of the driven 
dynamics. We show second-order correlation functions for the LIF model, on a log-linear plot with the 
y-axis in arbitrary units, scaled and shifted for comparison. Rate autocorrelation function (dashed), 
spike-triggered average voltage (dash-dot), STA current (solid), and the stochastic linearization filter 
(dotted). The upper data set corresponds to a = 8, with fcg = 2.5 and the lower, a = 0.45 with 
fco.45 = 1.06. The long time behavior the correlation functions is governed by the timescale identified in 
Eq. (|22j) . The decay of the spike- triggered voltage reflects the decorrelation of the voltage with the 
firing rate over longer timescales. 



as follows immediately from Eq. (|62[) with appropriate normalization. 



The method of matched asymptotic solutions |72j can provide an accurate closed form approximation to 
the STA for all times. In this method, each solution is used where it is valid, and a complete solution 
is found by matching the two at an empirically chosen intermediate point, t m . Since the boundary term 
behavior is power-law, it decays very slowly relative to the exponential long-time behavior and so it needs 
to be cut off at the matching point. The amplitude of the exponential component is determined by the 
amount of linear subthreshold integration required to bring the voltage near enough to threshold so that 
a fast fluctuation in few time steps can bring the voltage above threshold. With the aid of simulation, 
we find that an excellent fit to the STA for sufficiently small dt is given by: 



(i(t)\sp) 



a a e 




for t <t m , 
for t m < t < 0, 
for t > 0. 



(80) 



Coefficient a a and the optimal matching time depend on the ratio of Vth — (v) to a •*/ r/dt. In typical 
simulations, <J^f~^ 3> v t h — (v) always. 

For larger dt (although still dt <C r), Eq. (|8U|) fails even though the qualitative behavior of the STA 
remains the same: empirically, the square-root singularity is modified. An excellent fit can still be found 
if we allow the exponent and matching time to depend on the time step. We find that the general 
semi-empirical closed form is 



(i(t)\sp) 



h 

a a e~ 




fc<7 W 



ML 



for t < tm, 
for t m < t < 0, 
for t > 0. 



(81) 
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Coefficient a„ scaies as 



2 (vth - v ) 



(vth-M) 



for fx — ^ 0, 

for cr^y^ > vth - (v), 

(vth -{•")) 



with Gb] - 2^-7^ and ^ - 2.2 (G[xl - 0.1) for ^h^Ll < q.4 empirically; the numerical coefficient 
in G[x] is weakly c-dependent. The exponent scales logarithmically as 

v w i + 0.151nfl + 660*^ , 

although we find that the fit is indistinguishable for v ± O.lQ For o\J~^- too large, the STA begins to 
oscillate below this solution for small times (not shown). While the result in Eq. ([50)) in the limit dt — > 
stands on elementary theoretical grounds, we are not aware of a principled argument that produces the 
anomalous scaling of the exponent with dt. We share this result to add it to the cabinet of curiosities 
associated with the LIF model. Comparison with typical simulation results is shown in Fig. 1101 




Figure 10. Semi-empirical closed form for the STA of the LIF model. The result in Eq. (|8Tj) 

is shown (dashed black) over the corresponding simulation result for typical examples; y-axis is STA in 
terms of MSJse).. a. Fixed standard deviation, varying time step: a = 2, 

f = {40~ 1 ,80~ 1 ,400- 1 ,800- 1 } : purple, green, orange, red respectively. G[x] = 1.9 ^"ffi , fe = §. 
B. Fixed time step — = 40^, varying standard deviation. Color code as in previous figures. 



Stochastic linearization filter for the EIF model 

For the EIF model, the derivation of the stochastic linearization filter is more subtle because of the 
condition that we only optimize over v(t) < Vth.a- To simplify the error term in Eq. (|76p. the spike- 
generating current, f(v), is broken into two parts. Below the dynamical threshold, Vth, the detailed shape 
of f(v) is critical for spike initiation and the integration of the input. However, above Vth, f(v) drives the 
spike rapidly and then the system is returned below threshold by the reset; super-threshold, the details 
of f(v) are largely irrelevant to the evolution of the voltage between spikes. Previous spikes affect the 

7 For dt = t/800, the observed logarithmic scaling predicts v w 0.6. In this case, change in the goodness of fit for the 
small correction from v = 1/2 can easily be compensated by small changes in the amplitude and matching time. 
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instantaneous state of the voltage through low pass filtering with the membrane, and so the net effect 
of the super-threshold currents is to displace the voltage on average by an amount proportional to the 
distance between rest and threshold and proportional to the mean firing rate: 

* — [f(v(t')) H[«(0 - v th ] - (v a - v r ) TR(t')} « - (Vth - Vr) rR ai 

T 

and so, inside the above integral, we can take: 

/(«(/)) K-W) - V th ] ~ (V S - V r ) TR(t') « - (v th - V r ) TR(t'). 

For voltages below vth, the form of f(v) is relevant and so we cannot simplify it. Thus, the subthreshold 
error term is approximately: 



e(t) 





[* dt' jLzt 


v(t)<v thi<J 


JO 



i{t') + f(v(t')) R[v th - v(t')} - (v th - v r ) rR(t') - a^ 1 * s x ) (t')J • (82) 

as for the 



The stochastic linearization filter can now be found from minimizing the variance of Eq 
LIF model. The result is shown in Eq. (|25p and an example is shown in Fig. 03. 



Moment-based asymptotic results 

Because of the non-Markov structure of the conditional voltage process, the formal solution for j^y(t) | s x (<)] 
in Eq. ([75]) is difficult to use. Useful asymptotic results are more accessible from studying the moments 
of p[i;(f)|s x (£)] directly. Using Eq. (|44[) and the conditional input ensemble, we can study the lower 
moments; the conditional mean, 

(v(t)\s x (t))=v + f—e^ [(i{t')\ S S)) + {f(v(t'))\ Sx {t))-{v s -Vr)T(R(t')\ Sx (t))] , (83) 
Jo T 

and the conditional variance, 

0„ 1sx (t', t"; t) + cf> ff]s „ (t',t"; t) + {v s - v r ) 2 T 2 <j, RR \ Slt (t', t";t) 

+ 4>if\ Sx t"]t) -2(v s - v r ) T(j> fRls!c (if, t"- 1)-2 (v s - v r ) r^ja| a . (£', t) , 

(84) 

where <f>u\ 3a (t', 4"; t) is given in Eq. f|68[) . All correlation functions are defined with mean subtraction as 
in Eq. (|66p : 4>RR\ Sa . (t'it"] t) and <j>ff\ Sx (t' ,t";t) arc stimulus conditioned rate and f(v) autocorrelation 
functions; 4>if\ Sa . (f , t"; t), <j>fR\ Sx (f , t"; t), and </>_Ri| Sx (tf , t"; t) are the stimulus conditioned cross-correlation 
functions. Higher moments in principle follow similarly. In general, the moments are no more useful than 
formal solution for the conditional process in Eq. (|70|) because of the unknown correlation functions. 
However, in limiting cases, analytic results can be derived from the low moments alone. 



r , . I , ,n f dt' f dt" t' + t" 

Var [v(t) | s x {t)] = — e " 

Jo T Jo r 



LIF model: small input standard deviation 

In the low variance limit, spikes are well-separated, and in the run up to a spike, previous spikes are 
rare for most filtered stimuli. In this limit, p[v(i)|sa;(i)] is approximately Gaussian for non-spiking s x (t), 
where almost all the variance in the voltage estimate is due to the difference between the optimally 
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predictive filter and the membrane filter of the LIF model. In this limit, the leading order conditional 
mean and variance in Eq. (|83[) and (|84[) are (suppressing t): 

(h m * h x ) 



V2 



Var^o [v\s x ] 



r 



1 - (h m * h x y 



where h m is the intrinsic membrane filter defined in Eq. 
h x = h m , reproducing the result in Eq. (jXTJ) . 



5J). For g — > 0, the variance is minimized when 



In the small noise limit, for values of the filtered stimulus s m < Sth, a Gaussian (limiting to a delta- 
function) based on the above moments describes p[u|sa;]. For small-but-finite a, the Gaussian model 
will be perturbed because of spiking, taking the super-threshold probability predicted by the Gaussian 
model and spreading it near the reset, v r . Thus, for small a, the full conditional distribution will look 
like: 



t 



^271-Var^o [v\s x ] 



oM=xi H [vth — v] + residual \y | s x ] 



(85) 



where the residual accounts for the probability that has been reset and depends on the ignored rate- 
current and rate-rate correlation functions. Empirically, this formula holds through the onset of spiking 
at Sth with low probability residual. For increasingly large values of the filtered stimulus, more than 
one spike is possible in the correlation time of the filter and so this relation, based on ignoring the spike 
autocorrelations, breaks down. 



The rate estimation function following from Eq. ([85)1 follows from the general solution in Eq. (|T0|) . As 
discussed near Eq. (JXTJ) , the rate estimation functions for the membrane filter arc sharply peaked at 
Sth and are approximately independent of the input statistics for a < 0.6. Results are shown in Fig. 

nn 



LIF model: large input standard deviation 

We use a couple of tricks to directly derive an asymptotic expression for R a [s x (t)] for sufficiently large 
values of s x . When s x is sufficiently large, many spikes will occur during the integration window of the 
filter. Accordingly, the conditional mean voltage given large s x must be Vth ~ Vr since the voltage trace is 
either at threshold, reset, or rapidly transitioning between the two. The mean in Eq. (|83p becomes 

^V^=«o+ f —e^(s x (t)h x (t-t')-{v th -v r )r(R(t')\ Sx {t)). (86) 
^ Jo T 

In this limit, for (R(t')\s x (t)}, we take the ansatz 

h x (t - f + dt) \ . h x (t-t' + dt) 

(R(t )\s x (t)) ~ Ra ( 1 — 7T \ I + Ra[s x (t)} . (87) 

\ IliaXl li x I J JlldAI IL X I 

This ansatz is based on three ideas. First, consistency between the LN and dynamical models requires 
that the mean conditional rate at t = t' must be the LN model rate, Rcr[s x (t)]. Second, for t' t, the 
mean conditional rate must go to R a , independent of s x (t). Third, the time dependence of the mean 
conditional rate is the same as that of the filter. This must be approximately true because, at large s x (t), 
the dynamics are saturated and the conditional mean voltage must be equal to v * h ~ Vr for any large s x 
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Figure 11. Asymptotic results small a: threshold detection in the LIF model A. Numerical 
conditional voltage distribution for a — 0.45 for the membrane filter, p a [i>|s m ] . axes in units of Vth — v . 
Color shows shows probability density. In the regime below the threshold filtered stimulus, the residual 
is two or more orders of magnitude below the primary (singular) Gaussian component. B. Rate 
estimation functions for the membrane filter, R a [s m }, for a = {0.45, 0.5, 0.55}. In this regime, the LIF 
model is primarily a detector of the passage to threshold with s tn = \/2(vth ~ v ). For larger stimuli, 
the origin of the fall-off and return to spiking with increasing s m can be seen in the voltage distribution 
in panel A as the mass of the prediction at large s m cycles between v r and Vth ■ 



and for times significantly prior to t during the support of the filter — the conditional mean voltage is 
independent of s x and is approximately independent of t. We have been unable to derive this from more 
rigorous grounds but verify its validity numerically in Fig. IT2"B. Eqs. (|55|) and (|87[) will cease to hold for 
values of the filtered stimulus that are small enough so that the conditional mean voltage need not hold 
steady at 1 ' tfc 2 ~ tv . 

The rate estimation function in the limit that a is large compared to v tn — v and Vth — v r , follows from 
Eqs. (|gp) and ([57)): 

Ra[sx(t)} sJt) ... ,- , / m&x{h x )\ f v th - 2v a - v r \ max(h x ) 
-max(/i x ) y/ir + 1 



c V h m * h x ) \ 2cr 

This formula describes the estimated rate over most of the dynamic range of the neuron. Note that at 
large a, the last term goes to zero and the formula acquires the contrast-invariant form reported in Eq. 
(Pfj)) . Comparisons with numerical results are shown in Fig. [T2] 

Similarly, the optimally predictive filter becomes contrast invariant at high a. Using the results in Eqs. 
(|47| and (|48|) in Eq. (|22[) , the exponential time scale in the continuous time limit is 

- 1 1 K ' -1 + -«2.75. (89) 



Var[w 



For dt = t/40, we find in simulation that rj 2.5 because the mean firing rate is reduced for finite 
input correlation time [73] ■ The result for the STA in this limit is given in Eq. 
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Figure 12. Asymptotic results at large a: perfect contrast gain control in the LIF model. 

A. Normalized rate estimation functions based on the STA filter, and C, stochastic linearization filter, 
hi, for 4 < a < 10. Asymptotic prediction from Eq. (|88|) is dashed. The limiting perfect contrast gain 
control property of the LN model is filter- independent, reflecting the global origin of the phenomenon. 
The stimulus threshold scales with a and is Sth/ & ~ —\/2{v) jo = y/2/n, consistent with the effective 
operating point being located at the mean subthreshold voltage; the less predictive filter necessarily has 
a lower threshold (panel C). B,D. Example verification of the conditional rate ansatz in ([87]) . shown for 
a = 6. (B) STA filter; (D) stochastic linearization filter. Colors show simulation results for 



(R(r 

(R(t'] 



s x (t)); dashed line shows the filter. At sufficiently large values of s x , the shifted and normalized 
s x (t)} is approximately the filter (red ^ = 3.3; orange — = 2.2). The ansatz breaks down for 



smaller inputs (green — = 0.66). 



Higher order moment constraints for contrast gain control 

The shifted moment scaling relations in Eq. (f3"5j) are satisfied by the LIF model in the large a limit. We 
show the first moment explicitly in Eq. (f37|) . Here, we argue that higher order moments scale correctly 
as well, provided the first moment scales correctly. We discuss the LIF model (f(v) = 0) first. Each 
higher order shifted moment involves the averaged sums of products of the different terms in Eq. (|44[) . 
Hcuristically, 

((Vth,a - V) n ) = a n fi n ~ k ™ * + * + h ™ * 4>RH»-* + O* >n-pi"p, 

l<p<n 

where hm * is n-fold convolution with n copies of the membrane filter, <pin is the ?i*' i -order input 
autocorrelation, cf>^n is the n*' l -order rate autocorrelation, HPi n- P are the rate-input cross-correlations 
of combined order n, and a n /j, n - P iip are the products of the lower-order moments. 

To understand the scaling of each shifted moment, we identify the scaling of each term. First, for moment 
n, if each lower moment scales correctly, then the contributions from lower moments, cr n /z n _ p /i p , trivially 
scale correctly. Second, at every order, contributions to the shifted moments due to input autocorrelation 
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functions automatically satisfy the scaling relations since the input distribution is Gaussian with standard 
deviation proportional to a. 

Understanding the scaling of the terms involving the rate requires more finesse. Recall that the mean 
rate in the contrast invariant regime is linear in a. Looking at the order of each term, contributions from 
the lowest order rate-input cross-correlations scale correctly: 

h%> * &B-i ~ 0( ( r)0( ( 7 n - 1 ) ~ 0{a n ). 

Finally, all contributions from rate autocorrelation functions and rate-input cross-correlations of higher 
order than one in the rate are always sub-leading relative to the other terms and can be ignored. We 
start with the second-order rate autocorrelation function. Since the integratc-and-fire models are renewal 
processes, the second order autocorrelation is of the form [2] 

4> R R{t-t')=R <T g{t-t'), 

with integrablc g. Convolved twice with the membrane filter, the contribution of this moment is O(R) ~ 
0(a), which is lower order than the 0(R 2 ) ~ 0(a 2 ) expected from naive power counting. Higher 
order powers of R are sub-leading relative to naive power counting as well. Thus, contributions from 
higher order rate correlation functions to the shifted moments are sub-leading for large a, contributing 
at most: 

ht ] * to*-* ~ 0(R p ^)0(a n ^) ~ ©(a"- 1 ) « 0(a n ), 

for n > 2 and p > 2. To summarize, when the first shifted moment obeys the perfect gain control 
constraint, each higher order moment is guaranteed to obey its constraint as well. 

Moment constraints for contrast gain control in the the EIF model 

Results for the EIF model follow similarly: once we have control over the first shifted moment, the others 
follow for the reasons above. To calculate the first shifted moment, we need the mean subthreshold 
voltage. From Eq. (gU) , tms is 

(v(t)\v(t) < v th , a ) =v + — [-(v s -v r )T(R{t')\v(t) < v th . a ) + {f(v(t')) \v(t) < v th .*)} ■ 

Jo T 

We can simplify this using the trick introduced preceding Eq. (|82p : because of the low pass filtering of 
the membrane, the contributions of the reset and f(v) when the voltage is above the unstable fixed point 
are fast and opposite and thus approximately cancel, leaving only the persistent part contributions below 
Vth- This approximation gives 

(v{t)\v{t) <v th , a ) ^v -{v th -v r )TR a + / — e^ 1 (f(v(t'))\v{t) < v th ) . 

Jo T 

The first shifted moment from Eq. (|35[) is 

rt ^-jJ t ,_ t 

VfJ-1 ~ Vth.cr -v a + (Vth ~ Vr) tR - I — e~ (/ (v(t')) \ v(t) < V t h) ■ 

Jo T 

Because f(v) is nonlinear in the voltage, the integrated average will in principle involve terms of all 
orders in a and so this scaling relation can never be exactly satisfied, even asymptotically, unless f(v) = 
0. However, there are two limits in which we can handle f(v) and both lead to a similar functional 
result. 
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First, for A <C v t h — v a , the contribution from f(v) below threshold because f(v) is small compared to 
the explicit leak below threshold, and so we can ignore it. Also, Vth,a grows approximately linearly from 
Vth with er, as shown in Eq. ([7]). Accounting for the scaling of the stochastic threshold takes /ii to a new 
constant, fx±, and v t h,a to Vth in the previous equation. Thus, for the EIF model to show approximately 
perfect contrast gain control for small A, it must also obey a linear rate constraint of the form: 

R a T = a , (90) 

Vth ~ v r vth ~ v r 



where [i\ is determined by the model parameters; for the LIF model (A — > 0), comparison with Eq. (|47p 
gives jli = 1/^/tt. The stochastic threshold drops out of this expression because contrast gain control is 
a global phenomenon and does not depend on the exact definition of spike times. 

For v t h — v < A <C v s — v Q , f(v) is no longer smallH For hyperpolarized voltages, v < v — A, f(v) below 
threshold is primarily linear and increases the leak, as shown in Eq. (|43p . The first shifted moment can 
be re-expressed in terms of the hyperpolarized leak time constant, tj,: 

. t' —t 

f dt' e~^~ 

CTfJ-l « Vth.a -V D + (V t h - V r )T L R a - / (f(v(t')) - (v - VojfLr^lvit) < V th ) , 

JO T L 1 J-oo 

where f'_oo is the limiting slope of f(v) for large negative voltages. For larger A, t£ — ^ 0, so the time 
integral is supported only near t' = t. Also, f(v) — ¥ vth — v Q for v Q — A < v < Vth + A. Thus, in the 
limit, for a ~ v t h ~ v , we have: 

* *' — < 

i dt' e^L~ 

/ — : ' 1 (f(v(t')) -(V- Vo)/^^) < Vth) « V th ~ V , 
JO T L 1 J-oo 

and the first moment constraint becomes 

anx w v t h,<7 - Vth + (vth ~ v r ) t l R<j. 

After accounting for the linear scaling of the stochastic threshold, the mean rate constraint derived from 
the voltage distribution scaling for larger A is 

R aTL = a Ml . (91) 
Vth - v r 

Comparison of Eqs. (|31|) and (|9"Tj) shows that perfect contrast gain control is guaranteed for A ~ 1. We 
discuss this in more detail in Section: Contrast gain control in the EIF model. 

The derived constraints on the rate based on the scaling of the voltage distribution can be summarized 
across A as a linear constraint with variable intercept: 

D Pi (V th -V \ 

R a r = a v x , (92) 

Vth —V r \ Vth -V r J 

where v\ is a constant between zero and one and is determined by the model parameters. 



8 The limit of A ~ v s — v is unphysiological because the excitability of f(v) and separation between integration and 
spiking disappears: the model becomes equivalent to a leaklcss integrator with a reflecting boundary at v and absorbing 
threshold at v s . 
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